


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 











Volume XX XI December, 1940 Number 9 





A FUNCTIONAL CONCEPT OF INTELLIGENCE* 


CoMMENTS ON OUR CHANGING CONCEPT OF INTELLIGENCE, BY BETH L. 
WELLMANT 


STANLEY G. DULSKY 
Rochester Guidance Center, Rochester, N. Y. 


In a recent article Miss Wellman summarizes the results of perti- 
nent experimental investigations, indicates the relationship between 
IQ and environment, introduces the concept of ‘‘mental stimulation 
value,”’ and argues for the adoption of a functional concept of intelli- 
gence. I agree with the necessity for adopting a functional concept of 
intelligence and have amplified her exposition of this view. However, 
I am unable to agree with several of the arguments she advances to 
prove the logical need for adopting such a concept. 

Miss Wellman’s stimulating paper is opened by these statements: 


Results from long-time consecutive studies of intelligence of children are 
demanding certain changes in our concept of intelligence in order that our 
concept conform with the facts. Data showing large changes in IQ have been 
steadily piling up, until they can no longer be summarily waved aside. .. . 

The purpose of this article is to inquire into the significance of such 
changes from the standpoint of the concept of intelligence. What revisions 
in our concept are imperative and what theoretical considerations remain 
untouched by such findings? (p. 97) 





* Studies from the Institute for Juvenile Research, Paul L. Schroeder, M.D., 
Director, Series C, No. 304. 

The writer wishes to express his gratitude to Miss Wellman who, by publi- 
cation of her recent paper, helped crystallize many of the ideas expressed in this 
article, particularly, in the section entitled ‘‘A Functional Concept of Intelligence.” 

Mr. M. W. Richardson of the University of Chicago contributed many heipful 
suggestions. 

+ Journal of Consulting Psychology, Vol. 11, July-August, 1938, pp. 97-107. 
All quotations here are taken from this article unless indicated otherwise. 
641 





642 The Journal of Educational Psychology 


There follow illustrations of cases of children whose intelligence 
quotients changed considerably over a period of time. 


. . . What bearing have all of these findings [changes in IQ] and suggestions 
upon our concept of intelligence? 

First, they have a real bearing on our concept of the innateness of intelli- 
gence and the limits of change which heredity permits. . . . 

The fact that the concept of innateness must be broadened to allow for 
extreme changes during the lifetime of the child leads into the necessity of 
adopting a functional view of intelligence. The only alternative is to believe 
in what I prefer to call a mystical intelligence. It is mystical because nobody 
has ever demonstrated that it really does exist. . . . (pp. 102, 104) 


Before proceeding to a discussion of such subjects as the ‘‘concept 
of innateness,” ‘‘structural limitations” of the organism, ‘mental 
stimulation value,” and a functional concept of intelligence, it might be 
valuable to clear the ground by considering some theoretical and fac- 
tual material about this much-publicized phenomenon—large changes 


in IQ. 
LARGE CHANGES IN IQ 


Large changes in IQ have always been with us. They are expected 
on the basis of the theory of measurement error and they have been 
found empirically. 

For the sake of argument let us assume that the PE of an old 
Stanford-Binet IQ is five points. Then fifty per cent of repeated 
measurements of an individual by this test will vary more than five 
points from the original result. Eighty-two per cent of repeated meas- 
urements would vary ten points or less, ninety-five per cent of repeated 
measurements would vary fifteen points or less (assuming a normal 
distribution of the IQ’s of this individual). On the assumption, then, 
that the PE of an IQ is five points, psychologists would expect to find 
that ninety-five per cent of cases that were re-examined would vary 
fifteen points or less in IQ from the original test result; five per cent 
would vary sixteen points or more.* 





* Data obtained by Brown? have been analyzed by the writer and yield the 
following results (based on seven hundred six cases re-examined on the old Stanford- 
Binet): In fifty-eight and seven-tenths per cent of the cases there was a change of 
five points or less; in eighty-four per cent, ten points or less; in ninety-four and 
six-tenths per cent, fifteen points or less. These data support the expectancy 
based on a PE of five points in IQ. Nineteen cases shifted twenty-one or more 
points in IQ, one case increased thirty-eight points. 





a es es ee ee ee ee ee 


-_ 


| 


fer, — eo ~~ we — ~~ rr 


no — Ane _~ — Ae bee _ ~~ 


i... ~~ —" _ - 








A Functional Concept of Intelligence 643 


The implications of the probable error of an IQ are known to all 
psychologists. Have we, then, as Miss Wellman claims, been guilty of 
neglecting these large deviations? Have they been ‘‘summarily 
waved aside’? Perhaps it would be more generous to say that they 

-have not had the attention and study they deserve. Most investiga- 
tors have been concerned with averages and large groups and by neces- 
sity have had to overlook the extremes of the distributions. This 
situation is similar to one that existed half a century ago at Leipzig— 
Wundt with his emphasis on ‘‘laws”’ and Cattell with his nascent inter- 
est in “‘individual differences.” 

Many psychologists have been astonished at the great changes in 
IQ after nursery-school attendance—increases of forty and fifty points. 
Since a change of twenty points in IQ is expected in only a fraction of 
one per cent of the population, is it any wonder that there is astonish- 
ment when groups of children exhibit increases of forty and fifty 
points? However, when we realize that the children in these studies 
were subjected to tremendous changes in environment—similar to 
changes that ordinarily affect only a fraction of one per cent of a nor- 
mal population—then we can regard these children as being representa- 
tive of the tail-end of a distribution of cases ranging from no change to 
great change in environment. 

Miss Wellman states that this phenomenon (large changes in IQ) 
demands a change in ‘‘our concept of intelligence.’”’ I do not know 
exactly what is meant by “‘our concept of intelligence.’’ There are 
probably several conceptions of intelligence that are held by psycholo- 
gists but, certainly, any one of these should be able to account easily 
for large changes in IQ. Failure to do so would be fatal to any con- 
ception of intelligence. However, no matter what concept of intelli- 
gence is finally adopted, there will always be a probable error of 
measurement, a distribution curve, and a certain percentage of cases 
at the extremes. 

To summarize the foregoing discussion: (1) Large changes in IQ 
(defined here as beyond three PE) are expected on the basis of the the- 
ory of measurement error; (2) large changes in IQ have been found in 
many investigations; (3) usually the emphasis has been on the close 
agreement (ninety-five per cent of the cases) rather than on the disa- 
greement (five per cent of the cases). 


CONCEPT OF INNATENESS 


What bearing have all of these findings [changes in IQ] and suggestions 
upon our concept of intelligence? 








644 The Journal of Educational Psychology 


First, they have a real bearing on our concept of the innateness of intelli- 
gence and the limits of change which heredity permits. .. . It follows that 
the hereditary limits within which change is permitted must be set much 
farther apart than the customary allowance of four or five points for the 
probable error of the IQ. (pp. 102-103) 


What is a concept of innateness of intelligence and what is ‘‘ our con- 
cept of the innateness of intelligence’’?* What are ‘‘the hereditary 
limits within which change is permitted’’? and what is meant by 
‘‘customary allowance’? On such important points one could wish 
that Miss Wellman had given us definitions. 

Since I do not know what Miss Wellman means by “our concept of 
the innateness of intelligence,’’ I can only say that, to me, a concept of 
innateness of intelligence indicates merely that intelligent behavior 
is, to some unknown degree, influenced by hereditary characteristics— 
characteristics that were prepotent in determining the organism in 
embryo. If this point of view is acceptable, and I see no reason why it 
should not be, then no experimental findings can ever require a revision 
of such a viewpoint. 

Let us return to the quotation cited above, the last sentence of 
which requires careful analysis. First, the probable error of the IQ is 
a measure of the amount of error in the IQ and has no particular and 
necessary reference toinnateness. Secondly, what basis is there for the 
statement that ‘‘the hereditary limits within which change is per- 
mitted” are conceived by psychologists to be only four or five points? 
Let me quote from Burks, (3) who, among the nature-nurture warriors, 
is regarded as the Joan of Arc of the “‘nature”’ camp. 


The maximal contribution of the best home environment to intelligence 
is apparently about twenty IQ points, or less, and almost surely lies between 
ten and thirty points. Conversely, the least cultured, least stimulating kind of 
American home environment may depress the IQ as much as twenty IQ 
points. But situations as extreme as either of these probably occur only once 
or twice in a thousand times in American communities. (p. 309) 


If Miss Burks, a proponent of the ‘‘nature” group, grants a change of 
twenty points in IQ as a result of environment, on what basis can Miss 
Wellman claim four or five points as a ‘“‘customary allowance’’? 

For the sake of argument let us assume that the ‘‘customary 
allowance”’ is twenty points instead of five. Recent studies have 
indicated changes in IQ of forty and fifty points. If we increase to 





* Italics not in original. 








A Functional Concept of Intelligence 645 


fifty points the twenty points allowed by Miss Burks and then state, 
‘‘Good environments may raise an IQ as much as fifty points; bad 
environments may depress an IQ as much as fifty points,” would that 
statement satisfy Miss Wellman’s contention thai ‘‘the hereditary 
limits within which change is permitted must be set much farther 
apart... 7’? 


STRUCTURAL LIMITATIONS 


Miss Wellman states that there are two factors which determine the 
amount of change in a child’s 1Q—the organism and the environment. 
Under the heading, ‘‘ The Organism,”’ ‘‘structural limitations” is listed. 


The second consideration is structural limitations. In relatively rare 
instances there are physiological conditions of the organism that determine limits 
beyond which the organism cannot go. In this category are birth injuries, brain 
lesions, structural defects of brain or nervous tissue, either present at birth or 
acquired thereafter, and other physical conditions which are serious enough to 
interfere with mental functioning. (p. 101, italics mine) 


The italicized statement above is a contradiction of the theory of 
the physiological limit which merely states that there are definite 
limits, structural and physiological, ‘‘beyond which the organism can- 
not go.”’” The physiological limit applies even when there is no pathol- 
ogy of the kind described above. In earlier writings Miss Wellman, 
and Stoddard and Wellman have explicitly recognized the physiolog- 
ical limit. It is difficult to understand why, in the article under 
discussion, she has made the statement just quoted. 


. . » Nevertheless, there are probably definite constitutional limits within 
which intellectual growth may be altered.* (p. 81) 


Again 
_. « definite limits to mental development are set by heredity. . . . 7 (p.178) 


No one knows when the physiological limit of the organism has 
been attained; studies in learning and work have indicated quite clearly 
that greater efficiency can be obtained by increasing motivation with 
the result that the so-called ‘‘ physiological limit’’ gives way to a new 
one. But there is a point beyond which, under most favorable circum- 
stances, better nerformance cannot be obtained. The present world 
record for the one hundred-yard dash is nine and four-tenths seconds. 
It is possible that with improved methods of starting and better shoes 





646 The Journal of Educational Psychology 


this record can be lowered (as previous records have been lowered 
within the past twenty years), but would anyone care to predict that 
the record might be lowered to three seconds without a radical change 
in the structure and physiology of man? 

Let us take from the mental field a similar example of physiological 
limit. Would a child with an IQ of 50, free from birth injury, brain 
lesion, and ‘‘structural defects of brain or nervous tissue,”’ after having 
lived in an average American environment ever test at 150 IQ even if 
placed in an environment of the greatest ‘‘mental stimulation value”’? 
He might, under the foregoing conditions, test as high as 90 or 100 IQ, 
but he would probably never test at a very superior level. In closing 
this section I would like to quote again from Stoddard and Wellman (7). 


. . . It was generally conceded, even by research workers whose findings were 
somewhat at variance with each other, that the great bulk of mental ability 
as measured by tests comes as a direct inheritance. The real question con- 
cerns the amount of variability which can still be effected by later influ- 
ences. . . . (p. 170) 


I agree whole-heartedly with this statement; but it is not necessary to 


discard the concept of the physiological limit to allow for great changes 
in IQ. 


MENTAL STIMULATION VALUE OF ENVIRONMENT 


The factors that govern distance [number of IQ points of change] can be 
grouped under two headings: the child (or the organism) and the environment 
(considered only from the standpoint of its mental stimulation value, since all 
other aspects of the environment are irrelevant to the problem). (pp. 100- 
101) 


Under the heading, ‘‘The Organism,” are listed: Initial IQ, struc- 
tural limitations (discussed earlier in this paper), and psychological 
limitations. The last group includes all factors that make individuals 
of the same IQ psychologically different in receptivity—emotional 
blockings, variations in goals, differences in emotional tensions, etc. 
Three ways in which environments have been demonstrated to differ 
(in terms of mental stimulation value) are listed: General environment 
—home, geographical, and general culture; educational practices and 
procedures; mental level of child group. 

Miss Wellman offers the term, ‘‘mental stimulation value” (MSV) 
as an explanatory concept to indicate how certain features of the 
environment influence IQ. It is implied in her discussion that 





Ra =: =o ey 


-— = == B 4. he! Oe 


rr ee ee | 








A Functional Concept of Intelligence 647 


increases in IQ are a result of stimulating environments; decreases in 
IQ are the result of non-stimulating environments (physical factors 
ruled out). But nowhere in her discussion is there a clear exposition 
of what is meant by MSV. 

Mental stimulation value is difficult to define, but it is apparent 
that some definition must be framed if the relationship between intelli- 
gence and MSV is to be investigated. At the present time, because of 
lack of a precise definition we are apt to argue in circles about the effect 
of environment on IQ. If the IQ has been raised, we say that there 
has been a change in the MSV of the environment; if no change is 
found we say that there has been no change in the MSV of the environ- 
ment. Now, there should be a measure of MSV, independent of intel- 
ligence, so that change or lack of change in MSV can be correlated 
with changes in IQ. The problem then is: Can we obtain an inde- 
pendent measure of the MSV of an environment? 

MSV is necessarily the result of an interaction between the environ- 
ment and the individual. The environment has the potentiality of 
being mentally stimulating; the organism has the potentiality of being 
mentally stimulated. Then to determine the MSV of an environment 
it would be necessary to determine the environment’s potentiality for 
stimulating and the individual’s potentiality for being stimulated. 

The individual’s potentiality for being stimulated is related to 
intelligence, structural limitations, psychological limitations, age, and 
past experience. One would need a measure of all these variables to 
determine the individual’s potentiality for being stimulated. The 
environment’s potentiality for stimulation might be rated by such 
means as a scale of socio-economic status, plus some evaluation of a 
number of additional items of information which would include the 
“parent practices and procedures” of Wellman, and the attitudes of 
the parents toward the child. 

Now it is obvious that a mere description of a child’s environment 
will tell us very little of its MSV for him. John might be surrounded 
by the best books for children but there is little MSV for him if he does 
not read them or if they are not read tohim. A child might have very 
intelligent parents, well educated, and highly cultured, but what 
mental stimulation would he get from them if they rejected him and he 
were so insecure that his receptivity were diminished? 

I have attempted to reduce the concept of MSV to a number of 
discrete measurable items. Since so many of these items cannot be 
measured, and since the interaction between the individual and the 








648 The Journal of Educational Psychology 


environment is a dynamic, changing relationship, I do not see how we 
can ever get an independent measure of mental stimulation value. 

Is it necessary to posit MSV to explain the results that have been 
obtained in various investigations? Would it not be simpler to investi- 
gate the relationship between two variables and omit any reference to 
a hypothetical intermediary? This type of an investigation is well- 
illustrated by the title of one of Miss Wellman’s papers, ‘‘The Effect of 
Preschool Attendance upon the IQ.” If all other variables are con- 
trolled then we can say logically that the preschool attendance had 
such-and-such an effect on the IQ. The question of why? must wait 
until certain features of the preschool environment have been investi- 
gated. It is my contention that we do not aid our understanding of 
this problem by labelling the environment of a nursery school mentally 
stimulating. 

Careful investigations of preschool attendance might show, for 
example, that the increases in IQ are due to the fact that the child is 
absent from certain deleterious features of his home environment, or to 
the socializing process of playing with other children, or to the superior 
mental level of the children in the school (Wellman’s data indicated 
that the children who were below the average of the group intellectually 
gained the most in IQ). Another possible factor, and one which I 
believe might prove to be the most important, is the relationship that 
develops between child and teacher. Unfortunately, this variable 
defies measurement. Barrett and Koch (1) recognized this factor and 
gave such a vivid picture of it that I quote it. 


We do know, to be sure, that the nursery-school teacher was an intelligent 
sympathetic person, who used every opportunity whose possibilities she 
appreciated to develop the children physically, emotionally, and intellectually. 
Especially did she labor to foster constructive attitudes—attitudes of con- 
fidence, inquiry, creativeness, respect for property, sympathy, codperative- 
ness, unselfishness, etc. The experience range of the children she tried to 
extend by answering thoughtfully their questions, by taking them on frequent 
walks, field trips, and visits; by telling them appropriate stories; and by 
capitalizing the educational possibilities of the orphanage. Sleep, feeding, 
elimination, and sex habits also received a share of her attention. The 
children’s trust in, and adoration of, their guide were testimonials of a very 
beautiful relationship. (p. 114) 


It is not my purpose here to try to account for the increases in IQ 
that are effected as a result of preschool attendance, but merely to 
indicate that our knowledge will be increased by analytical investiga- 








A Functional Concept of Intelligence 649 


tions and not by the adoption of concepts whose validity can never be 
subjected to experimental proof or disproof.* 


A FUNCTIONAL CONCEPT OF INTELLIGENCE 


The fact that the concept of innateness must be broadened to allow for 
extreme changes during the lifetime of the child leads into the necessity of 
adopting a functional view of intelligence. The only alternative is to believe 
in what I prefer to call a mystical intelligence. It is mystical because nobody 
has ever demonstrated that it really does exist. (p. 104) 


Earlier in this paper I criticized the concept of innateness and 
indicated that if the definition offered here is accepted no revision of it 
is necessary. But I agree with Miss Wellman on the necessity of 
adopting a functional view of intelligence. Several years ago she 
wrote: 


Certainly intelligence cannot be regarded as static; it should be regarded in 
terms of growth rather than as a fixed quantity.* (p. 81) 


and again, Stoddard and Wellman (7) wrote: 


Intelligence then is not a thing in itself, but is a way of responding to 
problem situations. (p. 178) 


I shall try to amplify and supplement Miss Wellman’s discussion of 
this topic. 

A functional concept is a concept defined in terms of operation or 
activity; it is a dynamic rather than a static concept.t A functional 
concept of intelligence is one that regards intelligence in terms of 
activity or performance. Intelligence is not something that one has; 
it is a label applied to a mode of behaving or performing. A certain 
type of behavior in a certain situation may be described as intelligent 
behavior. 

Let us analyze the process of testing intelligence to determine 
basically what we are doing. We give an individual certain types of 
problems and tasks to solve and perform: We ask him to define words, 





* Dashiell‘ in his presidential address commented upon the clean-cut experi- 
mentation of Ebbinghaus: ‘‘ What neater example could we ask of the positivistic 
spirit—which is, I take it, only another name for an unusually stubborn insistence 
upon our not going beyond the facts we can observe into speculating about ultimate 
existences and causes.”” (p. 5) 

7 Stevens® defines an operation as “ .. . the performance which we execute 
in order to make known a concept.” ‘‘Operationism consists simply in referring 
any concept for its definition to the concrete operations by which knowledge of the 
thing in question is had.” (p. 323) 








650 The Journal of Educational Psychology 


to solve arithmetic problems, to detect absurdities, to repeat a series of 
digits, to give similarities, etc. His responses to these problems and 
questions, his performance, is labelled intelligent behavior. We are 
not measuring intelligence; we are obtaining a sampling of responses 
which are called intelligent. And, furthermore, we are not measuring, 
Measurement requires an origin and equal units, two standards which 
we do not ordinarily have in psychological testing. What we do when 
we ‘“‘test intelligence” is to compare one individual’s performance with 
the performance of other individuals of the same age or of the same 
educational attainments. 

Performance on an intelligence test is variable—just as variable as 
any other human performance. In addition, the ‘‘measurement”’ of 
this performance is influenced by variable or chance errors. If we 
grant that an individual’s performance—any kind of performance—is a 
function of the interaction between the organism and the environment, 
then some changes in these two variables might influence the perform- 
ance. Let me cite an illustration of a change in the organism influenc- 
ing performance. 

I measure the strength of my grip today by the Smedley dyna- 
mometer and find that I exert a pressure of 30kg. I retest the strength 
of my grip a year later after twelve months of exercise of the muscles 
involved in this performance and learn that I now exert a pressure of 
40 kg. This fact does not astonish me or anyone else—the explanation 
offered is that exercise of certain muscles over a specified period of time 
enabled me to exert more pressure on the dynamometer. Suppose 
that, again, a year later I retest the strength of my grip after a period 
of no exercise of these muscles; the reading is the same as that recorded 
two years earlier, 30 kg. Again a simple explanation is offered: The 
effect of the year’s practice on these muscles and nerves has disappeared 
and they are now in a physiological state similar to that they were in 
when I took the first reading. No one criticizes any of my results, no 
one says that the first reading was erroneous or that the second reading 
was erroneous. The results are accepted and the change is explained 
in terms of changes having taken place in the organism. 

How different the situation is in the field of mental testing! I test 
John Jones three times with the Stanford-Binet test and obtain excel- 
lent codperation each time. In January, 1936, he had an IQ of 80; in 
1937, an IQ of 100; in 1938, an IQ of 80. Are my results accepted and 
the discrepancies interpreted in terms of changes having taken place 
in John? No. After the second rating I become suspicious of the 








A Functional Concept of Intelligence 651 


first one, and after the third rating I become suspicious of the second 
one. I doubt the validity of my results. Why? 

The explanation lies in our thinking: We seek, expect to find, and, 
usually do find, consistency. As pointed out earlier, ninety-five per 
cent of our cases vary fifteen points or less in IQ on a reexamination. 
It is natural, then, to anticipate, in terms of probability, another rating 
in close agreement with the first. We want to be “‘scientific,’”’ we want 
consistency, and we become disturbed when our results do not satisfy 
these demands. 

The fact that great changes in IQ are possible and have occurred 
does not mean that our faith in the stability of the 1Q need be shattered. 
Unless there has been a marked change in the organism or in the 
environment the second IQ will closely approximate the first. As 
Wellman said: 


But it is now clear that the reason changes are not ordinarily found is that 
no real change has usually been made in the environment. (p. 99) 


English and Killian (5) recognized both the constancy and the 
variability of the 1Q: 


A certain amount of constancy and a reciprocal amount of variability under 
“ordinary conditions”’ is a statistical fact. 


I would like to remark here, parenthetically, that changes in the 
environment influence test performance only by virtue of changes in 
the organism. 

In conclusion, adoption of a functional view of intelligence will do 
three things: (1) It will place us in a more secure position logically 
(intelligence will not be thought of as some mystical entity residing in 
the brain); (2) changes in IQ will be interpreted as results of changes 
in the organism (similar to changes evoked by exercise or lack of it); 
(3) it will make it easier for psychologists to understand, interpret, and 
accept large variations in IQ. 


SOMMARY 


(1) Large changes in IQ (defined here as changes beyond 3 PE) are 
expected according to the theory of measurement error. Large 
changes in IQ have been demonstrated empirically in many studies, 
although the emphasis has usually been on the close agreement 
between the two test ratings rather than on the disagreement. 

(2) If the definition of ‘“‘concept of innateness”’ offered here is 
accepted, then there is no need for its revision. 








652 The Journal of Educational Psychology 


(3) We do not have to give up the concept of the physiological 
limit to allow for great changes in IQ. 

(4) ‘‘Mental Stimulation Value”’ is a concept of doubtful value 
because it cannot be proved or disproved (how will we ever know if a 
given environment at a given time is, or is not, mentally stimulating?) 

(5) A functional concept of intelligence is a concept that is defined 
operationally; it regards intelligence in terms of activity or perform- 
ance. Performance is variable and is a function of an organism that is 
changing in its dynamic relationship to a changing environment. 
Adoption of a functional view of intelligence will make it easier for 


psychologists to understand, interpret, and accept large variations 
in IQ. 


BIBLIOGRAPHY 


1. Barrett, H. E. and Koch, H. L.: ‘‘The Effect of Nursery-School Training upon 
the Mental-Test Performance of a Group of Orphanage Children.” J. Genet. 
Psychol., Vol. xxxvu, 1930, pp. 102-122. 

2. Brown, A. W.: ‘“‘The Change in Intelligence Quotients in Behavior Problem 
Children.” J. Educ. Psychol., Vol. xx1, 1930, pp. 341-350. 

3. Burks, Barbara Stoddard: ‘‘The Relative Influence of Nature and Nurture 
upon Mental Development: A Comparative Study of Foster-Parent Foster- 
Child Resemblance and True-Parent True-Child Resemblance.” In The 
Twenty-Seventh Yearbook of the National Society for the Study of Education. 
Nature and Nurture. Part. I. Their Influence upon Intelligence. Bloom- 
ington, Ill.: Public School Publishing Co., 1928, pp. ix, 465 (pp. 219-316). 

4. Dashiell, J. F.: ‘‘Some Rapprochements in Contemporary Psychology.” 
Psychol. Bull., Vol. xxxvi, 1939, pp. 1-24. 

5. English, H. B. and Killian, C. D.: “‘The Constancy of the IQ at Different Age 
Levels.” J. Consult. Psychol., Vol. 111, 1939, pp. 30-32. 

6. Stevens, S. S.: “‘The Operational Basis of Psychology.” Amer. J. Psychol., 
Vol. xiv, 1935, pp. 323-330. 

7. Stoddard, G. D. and Wellman, B. L.: Child Psychology. New York: The 
Macmillan Co., 1934. 

8. Wellman, B. L.: ‘Growth in Intelligence under Differing School Environments.’ 
J. Exp. Educ., Vol. 111, 1934, pp. 59-83. 








CHILDREN’S INFORMATION AND SUCCESS IN FIRST- 
GRADE READING 


LEIGH PECK AND LILLIAN E. McGLOTHLIN 


The University of Texas 


A number of studies have been reported concerning the general 
information of kindergarten and first-grade children and che relation- 
ship of such information to early reading achievement.'~’_ The results 
of these studies lead to the following general conclusions: (1) Informa- 
tion test scores of children beginning school show a high correlation 
with their intelligence test scores; (2) the socio-economic status of the 
home from which the child comes influences to a marked degree his 
general information; (3) boys have a wider range of information than 
do girls; (4) the relation between children’s general information and 
their success in reading is significant. 

The present investigation deals more specifically with the informa- 
tion that children have concerning the factual content of first-grade 
texts. The relation of reading achievement to such information is 
compared with its relation to mental age, reading-readiness scores, 
socio-economic status, and behavior and personality ratings. 


THE SUBJECTS 


First-grade children were used as subjects. These children were 
enrolled as pupils in three first-grade classes in the Humble Elementary 
School in Humble, Texas, and in two first-grade classes in the 8. M. N. 
Marrs School, a rural consolidated school in the Aldine Community 
near Humble. These subjects were one hundred in number: Forty- 
nine girls and fifty-one boys, ranging in age at the beginning of the 
study from six years to ten years seven months, with an average 
chronological age of six years nine months. The study was begun 
with one hundred seventeen children, but because some of these with- 
drew from school before the study was finished, complete data could 
not be secured for the entire group. These children had had no kinder- 
garten training. No selective plan was used in choosing the subjects. 
All pupils enrolled in the classes mentioned above were included in the 
investigation. 


PROCEDURE 


Soon after the beginning of school, the Cole-Vincent Group Intelli- 
gence Test for School Entrants, the Lee-Clark Reading Readiness Test, 
653 





654 The Journal of Educational Psychology 


and an information test were given. Later, ratings were made with the 
Sims Score Card for Socio-economic Status and with the Haggerty- 
Olson-Wickman Behavior Rating Schedules. Near the end of the 
school year a standardized reading test was given (Metropolitan 
Achievement Tests, Primary 1 Battery, Form C). A record of each 
child’s marks in reading was kept. 

The information test used was prepared especially for this study. 
It was designed to measure, not the child’s general information, but his 
knowledge of the facts contained in the textbooks he would be expected 
to read. The primers and first readers adopted for use in the Texas 
schools were examined and the information judged necessary for 
comprehension of this subject-matter was listed. Questions and 
picture identifications, intended to test perception of this information, 
were assembled. The first tentative test form was tried out with a 
small group of children who were not used as subjects in the study. 
The final form of the test includes eighty-five items concerned with 
such topics as home and school life, life in the city, life on the farm, 
transportation, the circus, safety-first, nature study, familiar stories, 
and special days. The test was administered individually to the 
subjects during the first month of school. The test is not sufficiently 
exhaustive to show the amount of information possessed by an indi- 
vidual child, but it constitutes a sampling of this knowledge and 
furnishes a basis for comparing him with other children. The ques- 
tions are listed in the Appendix to this report. 


RESULTS 


The discussion which follows will include results from the tests used 
to determine the relation of reading achievement to information, to 
reading readiness, to personality traits, to behavior, to intelligence, and 
to socio-economic status. 

1. Information and Reading Achievement.—The results of the infor- 
mation test indicate a marked lack of uniformity of information among 
these children concerning the subject-matter included within the test. 
The scores ranged from fourteen to eighty points. The mean score for 
the group was 40.8, with a standard deviation of 1.5 for the mean. 
Fifty per cent of all scores fell between 26 and 50; twenty-eight per 
cent, above 50; and twenty-two per cent, below 26. No question was 
answered correctly by all the children, and no question was missed by 
every child. 

The reading test scores on the Metropolitan battery showed a fairly 
normal distribution of reading ages and grades. 





= 24.6 BO et ' 








Children’s Information and Success in First-grade Reading 655 


The raw scores made on the three reading tests of the Metropolitan 
series, and the grade ratings representing total reading achievement on 
this test, were each correlated with the scores made on the information 
test. The results of these correlations are presented in Table I. 

The table indicates a higher correlation of information scores with 
the total reading achievement (.617 + .042) than with the subtests 
(.490 + .051, .506 + .050, .514 + .050, respectively). The correla- 
tion of test three with information, though it is slightly higher than that 
of the other subtests, was probably lowered by a large number— 
twenty-five per cent—of zero scores. The correlations are sufficiently 
high to indicate a relationship between this type of information and 
reading achievement. 


TABLE I.—CORRELATIONS BETWEEN INFORMATION AND READING ACHIEVEMENT 








Correlation 
Reading test with infor- PE, 
mation 
EET Oe EP FP eT Ore re .490 .051 
i ik as eek bean See EYE OS . 506 .050 
ens dna Bake h db eh ws a wide A Re .514 .050 
ee I, inccses wis un ceceesseeucs .617 .042 











TABLE II.—CoRRELATIONS BETWEEN READING READINESS AND READING 








ACHIEVEMENT 
Correlation 
Reading test with reading} PE, 
readiness 
ee een ree | .608* .043 
ON EE eer ee ee re ree eee eee ter .626 .041 
hh oo te aging elhng 6 din-4.-d os hb ee . 4389 .054 
EEE PEELE E LT PPO Te .623 .041 











2. Reading Readiness and Reading Achievement.—The Lee-Clark 
reading readiness test scores ranged from 0 to 49 with a mean score of 
25 + 1.25. Fifty subjects scored above 22, indicating that these 
children would probably succeed in first-grade reading; thirty-six 
between 12 and 22, indicating that these children might either fail or 
pass; and fourteen below 12, indicating for these children probable 
failure in first-grade reading. The correlations of these test scores with 
the reading achievement test used follow in Table II. 





656 The Journal of Educational Psychology 


The reading-readiness test shows a significantly lower correlation 
with the word-meaning test than with the other reading tests. It is to 
be noted that the reading-readiness test has almost the same correlation 
with the total reading test achievement as does the information test 
(.623 + .041 as compared with .617 + .042). 

3. Behavior Rating Schedules and Reading Achievement.—The rela- 
tion of the Metropolitan reading achievement test results to the 
behavior and personality ratings on the Haggerty-Olson-Wickman 
schedules is shown in Table III. The negative correlations are due to 
the fact that small scores represent few symptoms of maladjustment 
and, consequently, high ratings. 


TaBLE III.—CoRRELATIONS BETWEEN READING ACHIEVEMENT AND BEHAVIOR 
AND PERSONALITY RATINGS 





Correlation 
with reading| PE, 





achievement 
ED os. i ck ch ebaenedendnee reesees — .174 .065 
Co — .604 .043 











The correlation between the reading test scores and the Schedule A 
(Behavior) ratings is very low (—.17 + .065), indicating little or no 
relationship between these measures. A study of the score distribu- 
tions on the tests, however, yields interesting data. The scores on 
Schedule A ranged from 0 to 62 with a mean score of 14 + 1.07. 
Forty-six per cent of the scores were zero, denoting an entire lack of 
behavior problems for these children. Of the forty-six zero scores, 
thirty-eight were made by children who rated average or above on the 
reading tests; and of the forty-seven children scoring highest in reading, 
thirty-six were rated with behavior scores less than 10. However, 
there are marked deviations from this general correspondence. Of 
four children whose behavior scores were 60 or higher, two ranked 
above and two below the average of the group in reading. One pupil, 
ranking among the highest four per cent in reading, was given a 
behavior rating score of 60; and another pupil, rating only grade I’ 
(equivalent to the first month of the first grade) in reading, evidenced 
no behavior problems. The mean behavior scores were highest in the 
first and fourth reading test quartiles; these scores were 15 and 23, 
respectively. However, the average for the first quartile is not charac 
teristic of the group, but is due to outstanding deviations. Of the 
twenty-five best readers, fifteen had behavior scores less than 10, and 











Children’s Information and Success in First-grade Reading 657 


of the twenty-five poorest readers, seventeen had behavior scores above 
10. The lowest average, a mean score of 4, was observed in the second 
reading test quartile; the mean score for the third quartile was 14. 

The scores on Schedule B ranged from 38 to 127 and showed a more 
normal distribution than those on Schedule A. Since these ratings 
included a description of the child’s intellectual traits as well as his 
physical, social, and emotional characteristics, a significant negative 
correlation of these scores with reading progress would generally be 
expected. The correlation was found to be —.60 + .04. 

4. Behavior Rating Schedules, Reading Readiness, and Information. 
Table IV contains correlations which show the relationship between 
reading readiness and information scores, and the relation of each of 
these to behavior and personality ratings. The correlation of the 
reading-readiness test with the information test is relatively high 
(.613 + .042). Behavior Schedule B (Personality) has a small degree 


TaBLE IV.—CoRRELATIONS BETWEEN READING READINESS, INFORMATION, AND 
RATINGS OF BEHAVIOR AND PERSONALITY 














Reading readiness| Information 
r PE, r PE, 
Schedule A (behavior)... ......ccccccccess —.116 + .066 | —.183 + .065 
Schedule B (personality).................. — .343 + .059 | —.471 + .052 
RTE. SE rg TO EF an a .613 + .042 





of negative correlation with both information and reading readiness, 
but Schedule A (Behavior) shows little relation to either test. These 
correlations may not indicate a relationship among the tests them- 
selves, but may suggest a common relationship to intelligence or to 
socio-economic status. 

5. Intelligence and Socio-economic Status.—The interdependence of 
intelligence and socio-economic status and their importance to learning 
is generally recognized. These influences affect the acquisition of 
reading skills not only directly but also indirectly through their 
impress upon other factors which contribute to reading readiness. 
The correlation between intelligence and socio-economic status is 
.096 + .044. Their separate influences on reading achievement, 
information, reading readiness, and behavior and personality ratings 
will be discussed in the following paragraphs. 

6. Intelligence and Reading Achievement.—The chronological ages 
of the subjects ranged from six years to ten years seven months; their 
mental ages varied from four years eight months to nine years. The 





658 The Journal of Educational Psychology 


average mental age rating was six years seven months, and the average 
intelligence quotient was 98.0 + 1.36. Intelligence quotients ranged 
from 61 to 148. The intelligence quotients of sixty children fel] 
between 90 and 110. Only three intelligence test scores fell below 70, 
and only two above 120. On account of the limited range of the 
intelligence test, the very high and the very low scores are probably 
unreliable. The low average score is evidently due to the selective 
factor determining the first-grade group; this group included with the 
six-year-old pupils children above seven who had been unable to do 
first-grade work the previous year. 

Table V gives a summary of the relations of intelligence to reading 
achievement and to the factors discussed above as contributing to 
preparedness for beginning reading. Mental age appears to be of 
greater consequence than intelligence quotient in influencing reading 
readiness, information, aud reading achievement. For this group 
chronological age appears to have no significance for reading achieve- 
ment, and only little or none for information; the correlation with the 
reading readiness scores is sufficiently high to suggest a relationship 


TaBLE V.—CoORRELATIONS BETWEEN INTELLIGENCE AND READING ACHIEVEMENT, 
INFORMATION, READING READINESS, AND BEHAVIOR AND PERSONALITY 











RATINGS 
CA MA IQ 

r PR, r PE, r PE, 
Reading achievement........... .044 + .067 | .622 + .041 .483 + .052 
rs ad sd bias tine Ve ...| .114 + .066 | .773 + .027 .590 + .044 
Reading readiness.............. .388 + .057 | .866 + .017 470 + .052 
i so ae eked ee eeee P ces eenanwes — .356 + .059 
a i eae Deemed mean — .559 + .046 











between them. The correlation of intelligence quotient ratings with 
information scores is higher than with reading readiness; and the 
correlation of mental age ratings with reading readiness is higher than 
with information. However, mental ages show a higher correlation 
with both reading readiness and information than do intelligence 
quotients. The correlations indicate a negative relationship between 
intelligence and the presence of behavior problems and between 
intelligence and personality maladjustment. 

7. Socio-economic Status and Reading Achievement.—The Sims 
Score Card yielded an average score of 7.3 + .5 for this group. Most 
of the children were from middle-class homes. Ten of the poorest 





_ >. *& = 


~~, - ar _~ or one —_— —" ——' 


_, I Cee fet CU lCULhUOUCUD 


rr -_ ~~ 


hed « —" ~~ —" 








Children’s Information and Success in First-grade Reading 659 


homes were rated 0, and the home considered best received a rating of 
23; others rated between these scores. Table VI contains some 
correlations involving socio-economic status. 

The table shows the correlation of socio-economic standing with 
personality adjustment to be greater than that with reading achieve- 
ment, reading readiness, information, or behavior. The correlations, 
however, indicate a relationship between socio-economic status and 
behavior, reading achievement, and information. Reading readiness 
appears to be influenced least by home background. 

In summing up the relative effect of intelligence and socio-economic 
status upon reading achievement and upon reading readiness factors, 
it may be said that reading achievement, information, and reading 
readiness seem to be more dependent upon mental age than upon 
socio-economic conditions; behavior and personality adjustment seem 
equally dependent upon intelligence and upon socio-economic status. 
Information seems more dependent upon socio-economic standing 
than does reading readiness; reading readiness seems more dependent 
upon mental age than does information. 


TaBLE VI.—CoORRELATIONS OF SOCIO-ECONOMIC STATUS WITH READING 
ACHIEVEMENT, INFORMATION, READING READINESS, AND BEHAVIOR AND 
PERSONALITY RATINGS 











Correlation 
with socio- PE, 
economic 
status 

so ce ccup paws seas dteesseseeke ee .332 + .059 
EE ee ev cw eye avin ce oe aewncaseawkeny A477 + .052 
Rs init cin Sc ehean se khebshwebneueeaee . 227 + .064 
EOE ECL EP PLE PEP TT ET — .368 + .058 
OT eT rT ee - —,.593 + .044 








8. Correlation with Teachers’ Marks.—Using teachers’ marks as 
another measure of reading success, interesting findings were made 
through correlations with other reading factors. The correlations are 
shown in Table VII. 

The correlation between teachers’ marks and the reading achieve- 
ment test is high (.70). The correlations with personality adjustment 
are nearly the same for both reading measures (—.595 with teachers’ 
marks and —.604 with the achievement test). The correlations of 
information and reading readiness are higher with the achievement test 








660 The Journal of Educational Psychology 


than with teachers’ marks; they are for information .617 + .042 and 
.553 + .047, and for reading readiness .623 + .041 and .545 + .48 
with the two reading measures, respectively. Socio-economic status 
and behavior correlated higher with teachers’ marks than with the 
achievement test scores. The correlation of socio-economic status 
with teachers’ marks was .462 + .055 and with the achievement scores 
.332 + .059; the correlation of behavior scores with teachers’ marks 
was —.318 + .060 and with achievement scores —.174 + .065. 


TaBLeE VII.—CoRRELATIONS BETWEEN TEACHERS’ MARKS AND ACHIEVEMENT 
Tests, INFORMATION, READING READINESS, BEHAVIOR AND PERSONALITY 
RATINGS, AND Socio-EcONOMICc Status 








Correlation 
with teachers’} PE, 
marks 
EE ee ae eT 700 + .034 
ee eee log ok Oe Shae hae 6S dR eee ie han .553 + .047 
es cs a neh ae ea ee ae Eee ew oe .545 + .048 
EE — .318 + .060 
a ne — .595 + .044 
tc eve ddneketebv been ea eeawe .426 + .055 











9. Sex Differences in Reading Achievement, Information, and Reading 
Readiness.—The mean scores for boys’ and girls’ intelligence quotients 
were essentially the same (.985 + 1.36 as compared with .980 + 1.36). 
The girls’ average chronological age was two months greater than that 
of the boys. Their average reading readiness score was two points 
higher, and their reading grade, as measured by the achievement test, 
the grade equivalent of one month above that of the boys. The boys 
made an average score one point higher than that made by the girls 
on the information test. Correlations were computed for the boys and 
girls separately between the reading test and the information test, the 
reading-readiness test, chronological age, and mental age. These 
findings are presented in Table VIII. 

There is little difference in the correlations of scores made by the 
boys and by the girls. The boys’ information and mental age scores 
tend to correspond more closely to their reading scores than do those 
of the girls; whereas, the girls’ reading readiness scores correlate higher 
with reading achievement than do those of the boys. The difference 
between the correlations, however, was nowhere found to be high 
enough to indicate a significant difference between the boys and girls. 








Children’s Information and Success in First-grade Reading 661 


The greatest difference between the sexes was found in the case of the 
correlations between the reading-readiness test and the achievement 
test. In this case there is indication of eighty chances in one hundred 
that the obtained difference is significant, and that reading readiness 
(as measured by the Lee-Clark test) correlates more highly with read- 
ing achievement for the girls than for the boys. 


Taste VIII.—Sex DirFERENCES IN CORRELATIONS OF READING ACHIEVEMENT 
WITH OTHER VARIABLES 




















Correlation with reading 
achievement 
Boys Girls 

r PE, r PE, 
ie iii aah slane wees ot ada eae eRe .579 + .063 | .539 + .068 
I 6s 6d Doone neses 060K .577 + .064 | .677 + .053 
ss ie cig inden eden Gananebheeun .042 + .095 | .051 + .095 
I os ae ia a iG Rk Eh ech te A eas heh ig ee .634 + .057 | .607 + .060 

SUMMARY 


(1) The information measured by this test is comparable in signifi- 
cance for reading achievement to mental age and reading readiness 
scores. Four factors—information, reading readiness, mental age, and 
personality adjustment—seem most significant in influencing reading 
success, having correlations of .617, 623, .622, and .604, respectively, 
with reading achievement. 

(2) Intelligence quotients and socio-economic scores show an inter- 
mediate degree of relationship to reading achievement, having correla- 
tions of .483 and .332, respectively, with the reading test used. 

(3) Chronological age and overt behavior problem scores appear to 
have little significance for reading success, having correlations of only 
.044 and —.174, respectively, with reading achievement. 

(4) Information scores seem to be more affected by mental age than 
by socio-economic status, with correlations of .773 and .477, respec- 
tively, and, similarly, reading readiness scores more by intellectual 
maturity than by socio-economic status, with correlations of .866 and 
.227, respectively. 

(5) Boys have slightly more information than girls, and girls score 
a little higher on reading-readiness tests than boys. The boys’ 
achievement scores tend to correlate higher with the information test 
than do those of the girls (.579 as compared with .539), and the girls’ 








662 The Journal of Educational Psychology 


achievement scores correlate higher with the reading readiness test 
than do those of the boys (.677 as compared with .577). 

(6) For this group, teachers’ marks have a rather high correlation 
with the reading achievement test (.70). Information and reading 
readiness correlate more highly with the reading test (.617 and .623, 
respectively) than with teachers’ marks (.553 and .545, respectively); 
but socio-economic status and behavior ratings correlate more highly 
with teachers’ marks (.426 and —.318, respectively) than with achieve- 
ment scores (.332 and —.174, respectively). Personality ratings have 
a correlation of —.604 with achievement scores and a correlation of 
— .595 with teachers’ marks. 


BIBLIOGRAPHY 


(1) Garrison, K. C.: “The relative influence of intelligence and socio-cultural status 


upon the information possessed by first-grade children.’”’ J. Soc. Psychol., 
Vol. 11, 1932, pp. 362-367. 

(2) Hall, G. Stanley: ‘‘The contents of children’s minds on entering school.” Ped. 
Sem., Vol. 1, 1891, pp. 140-173. 

(3) Hildreth, G. H.: “Information tests of first-grade children.” Childhood Ed., 
Vol. rx, 1933, pp. 416-420. 

(4) Hillard, George H. and Troxell, Eleanor: “Informational background as a 


factor in reading readiness and reading progress.” Elem. Sch. J., Vol. 
XXXvVIII, 1937, pp. 255-263. 


(5) Probst, Cathryn: “A general information test for kindergarten children.” 
Child Develpm., Vol. 11, 1931, pp. 81-95. 


(6) Sangren, Paul V.: ‘Information tests for young children.’”’ Childhood Ed., 
Vol. v1, 1929, pp. 70-77. 


(7) Wilson, Frank T.: ‘Correlation of information with other abilities and traits 
in grade I.”” Elem. Sch. J., Vol. xxxvu, 1936, pp. 295-301. 


APPENDIX 


INFORMATION TEST 
I. Home and School 


1. What are these toys? (Picture of top, drum.) 
2. How many blocks do you see? (Picture of child playing with blocks— 
a total of nine blocks in three piles.) 
3. Where would you play with a sled? 
How would you care for a pet? 
If you live in the city, why must your dog wear a collar and a tag? 
When you go to a birthday party, what do you take? 
What do children do at a party? 
What does “‘invite’’ mean? 
Did you ever go on an errand? What did you do? 
Why do people have telephones? 


PSerPA SSS 








Children’s Information and Success in First-grade Reading 663 


11. 
12. 
dials.) 
13. 
14. 
15. 
16. 
17. 


9. 


PN > 


Why do we have clocks? 
Which clock tells you when to go to bed? (Picture of three clock 


Which one says three o’clock? 
What does this one say? 
What is a kitchen? 

What is a cellar? 

What is the attic? 


II, The City 
Who are these men? (Picture of postman, fireman, milkman.) 
What does a baker do? 
What would you see at a fire station? 
What is a siren? 
If you buy a dozen eggs, how many is that? 
Which is more money, a nickel or a dime? 
To what kind of store would you go to buy a new suit? 
What is an elevator? 
If you go to the top of a high building, how would the people on the 


street look to you? 


— jet 
Sor oN > 


nN 


SP Pr Ss Pre rrr 


III. The Farm 


What are these animals? (Picture of hog, sheep, calf.) 
Where do they live? 

What is a ranch? 

Why does a farmer keep chickens? 
Where do vegetables grow? 

Where do apples grow? 

From what is flour made? 

Where do we get honey? 

Where does wool come from? 

What is a fair? 

What does it mean to get a blue ribbon? 


IV. Transportation 


Where is the engine? (Picture of train.) 

When you ride on the train, who takes your ticket? 
What does the baggage man do? 

If you ride on a boat, where could you play? 

Who is the man who drives the airplane? 

Where are airplanes kept? 


V. The Circus 


Where would you see a tent like this? (Picture of circus tent.) 
. Who are these men? (Picture of four men playing band instruments.) 
. What are these animals? (Picture of camel, giraffe, two lions, a bear.) 





PrP rsa rr S 


a 


rrr Pr 


—_ 
SLONOESFr HSN 


ae 
Oo me WN 


PRPS PS Prrrr 


—_ 





The Journal of Educational Psychology 


Why do some animals live in cages? 

What do tigers eat? 

What is a parade? 

Who is the funny man? (Picture of clown.) 
What is the big animal? (Picture of elephant.) 
Point to the elephant’s paw. 

Point to his trunk. 

Point to his head. 


VI. Safety First 
Who is this man? (Picture of traffic policeman with two children.) 
What does he do? 
What are the colors? (Picture of traffic light.) 
What does the red light mean? the green light? 
If you meet someone, on which side of the street must you go? 


VII. Nature Study 


Where is the root? (Picture of plant.) 
Where is the stem? 

Where is the blossom? 

Where is the bulb? 

Where is the leaf? 

Where do squirrels live? 

How do they get ready ‘or winter? 
Where do bears live in winter? 
How do the trees look in autumn? 
Where do butterflies come from? 

. What are tadpoles? 

. When can you see your shadow? 

. Which way is east? 

. What is a mountain? 

. What is a thermometer? 

What is a magnet? 


VIII. Stories and Special Days 


Who is this boy? (Picture of Little Boy Blue.) 
Do you know a verse about him? 

Who is the little girl? (Picture of Little Bo-Peep.) 
Do you know a verse about her? 

Who is Goldenhair? 

Do you know any riddles? 

Why do you like Christmas? 

Who is George Washington? 

What do children do on Mother’s Day? 

What is a jack-o-lantern? 





~~ nae — —_ —ai hr Tam ~*~ 


- a> an. font @ 





REVIEW, WITH SPECIAL REFERENCE TO TEMPORAL 
POSITION 


A. M. SONES AND J. B. STROUD 


State University of Iowa 
INTRODUCTION 


The question of when to review or relearn, raises a problem that is 
of importance both in pedagogy and in theoretical psychology. Typi- 
cally, educational psychology advises students and teachers to review 
school materials comparatively soon after they have been learned. 
Moreover, if, perchance, the student wishes to indulge in several review 
exercises, frequent reviewings are recommended for a few days imme- 
diately after learning with increasingly infrequent reviewings there- 
after. Thus, a kind of negative acceleration of temporal spacings is 
advised. 

It has seemed logical to several educational psychologists that, in 
view of the negatively accelerated character of the curve of forgetting, 
initial reviews should appear early and be frequent. It is said that, by 
this procedure, forgetting may be defeated. This may be true if 
certain kinds of review are employed. However, one wishes to know 
whether or not it is more economical to maintain knowledge in this 
manner than it is to relearn it after a considerable portion has been 
forgotten; or, whether new associations profit more from repetition 
than old ones. 

In so far as the writers are aware, Lyon (1914) was the first to 
recommend the arrangement of reviews advocated above. His state- 
ment follows: 


In general it was found that the most economical method for keeping 
material once memorized from disappearing was to review the material when 
it started to “‘fade.’’ Here also the intervals were found to be, roughly speak- 
ing, in arithmetic proportions, as one day, two days, four days, eight days, 
etc. For similar reasons the student is advised to review his ‘‘lecture-notes’”’ 
shortly after taking them, and, if possible, to review them again the evening 
of the same day. Then the lapse of a week or two does not make nearly so 
much difference. When once he has forgotten so much that the various 
associations originally made have vanished, a considerable portion of the 
material is irretrievably lost. (p. 163) 


Lyon appears to have drawn the foregoing conclusions from 
empirical (though very limited) data rather than deducing them logi- 
665 








666 The Journal of Educational Psychology 


cally from the Ebbinghaus curve. Both of his statements are, in them- 
selves, defensible without however necessarily giving warrant to the 
generalization that one or more review exercises introduced soon after 
learning are any more effective than a similar number introduced at a 
later time. Naturally, in order to keep material from disappearing, 
review should come before it begins to ‘‘fade.’’ Likewise, in the case of 
lecture-notes, it is desirable to review them before they get ‘‘cold.” 
But it does not follow from this that a repetition of the lecture the 
following day would be any more efficacious than such a repetition one 
week or two weeks later; nor does it follow that if chapters in textbooks 
and collateral material are reviewed by a rereading, such a review 
should be indulged in on the same day or even on the following day, 
rather than at some later date. 

In 1920, Starch made the following guarded statement about 
review: 


The experimental work on forgetting is too limited as yet to permit of 
much definite application in the practical procedure of learning school material. 
The one suggestion that may possibly be made would be this: Since the rate 
of forgetting is very rapid at first and more gradual later on, it probably would 
be highly advantageous to have relearning of a given material come very 
frequently at first and more rarely later on. (p. 158) 


Jordan voices the same idea in 1933 (p. 185). It should be noted 
that these pronouncements of Lyon, Starch, and others are, apparently, 
not predicated upon any particular method of reviewing. This article 
attempts to show that there are two general kinds of review, recall and 
relearning, and that experimental conditions do not affect them 
similarly. 

Owing to the close similarity between review, by relearning, and 
distributed practice, it should be possible to predict from the experi- 
ments pertaining to the latter that such review would prove to be more 
effective when introduced after a lapse of some time than when 
introduced immediately. That is, the experimental findings with 
respect to distributed practice argue for considerable delay before 
reviewing, when some form of relearning, as opposed to recall, is used 
as the medium of review. 

Bridge (1934) attacked the problem under consideration by 
introducing review exercises of various kinds: (1) Just after a lesson, 
and (2) twenty-four hours later, or just before the next lesson. She 
used such methods of review as filling in blanks, oral questioning, 








Review, with Special Reference to Temporal Position 667 


writing on selected problems, and listening to summary lectures. Her 
criterion test was a one-hundred-item test comprised of five subtests. 

A small advantage was obtained in favor of a review exercise just 
before the next lesson, that is, twenty-four hours after learning. Cer- 
tainly nothing in her data gives any evidence that a review imme- 
diately after a learning exercise is more effective than one twenty-four 
hours later. The results are the same whether the criterion test was 
administered one week or four weeks after learning. Moreover, her 
results were uniform for all four of her review procedures. 

Peterson, et al., (1935) report findings relative to the relationship 
between the effectiveness of review and its temporal position. Using 
specially prepared learning material, six pages in length, they found a 
rereading review introduced seven days after the learning exercise to be 
equally effective with such a review introduced two or three days after 
learning. Similarly, review one day and nine days after learning were 
equally efficacious. Retention tests, essay type, were administered ten 
and twenty-one days after original learning. Groups were equated, in 
advance, on the basis of the Iowa Silent Reading tests. 

Spitzer (1939) has published the results of an investigation of the 
relative effectiveness of varying temporal positions of review, using a 
multiple-choice test as the medium of review. Thirty-six hundred 
sixth-grade pupils served as subjects. Various groups reviewed, that 
is, took the test, on the first, seventh, fourteenth, and twenty-first days 
after the original study exercise. These groups, together with their 
controls, were given a final criterion test twenty-eight and sixty-three 
days after learning. ‘‘ Recall,’’ he says, ‘‘in the form of a test is an 
effective method of aiding retention and learning and should therefore 
be employed more frequently in the elementary school. Since recall is 
most effective immediately after learning has ceased, the first recall 
should be given before much time has elapsed.” _ 

This is a crucial point. Spitzer has demonstrated for the first time 
in a major experiment, in so far as the writers’ information goes, that 
recall employed as a review exercise, indeed recall in the form of an 
objective test so employed, is more effective when introduced relatively 
soon after learning. ‘The experiment reported in this paper corrobo- 
rates this finding. Anticipating further the results of the present 
investigation, it may be said also that they confirm those previously 
reported, in that, when rereading is the instrumentality of review, no 
advantage arises in faver of any temporal position ranging from one to 
seventeen days. Thus, it appears that review is not a single phe- 








668 The Journal of Educational Psychology 


nomenon whose effectiveness depends upon a single set of circum- 
stances. In so far as the principles governing its effectiveness are 
concerned, there are at least two kinds of review: Recall, on the one 
hand, and relearning in the form of rereading, relistening, and the like, 
on the other. 


PROCEDURE 


This experiment was designed to compare three temporal positions 
each of two kinds of review. The review periods, all ten minutes in 
length, appeared on the first and third days (Position I), the eighth and 
fifteenth days (Position II), and the fifteenth and seventeenth days 
(Position III). Thus, days one and three, together, comprise one 
temporal position; days eight and fifteen, another; and days fifteen and 
seventeen, another. The effectiveness of two kinds of review, reread- 
ing and the taking of a multiple-choice test, is determined for each of 
the three temporal positions, the amount of time devoted to each kind 
being constant. The relative effectiveness of the three positions and 
two kinds is gauged by comparing scores achieved on a forty-item, 
four-response test administered, in each case, forty-two days after the 
original learning exercise. 

The learning material consisted of a seventeen-hundred-and-fifty- 
word article, arranged in mimeographed booklets, on the history of 
paper and the methods of making paper. The subjects were allowed 
twenty minutes in which to read and study this material. Table I 
presents a schematic arrangement of the experiment. 


TaBLE I.—ARRANGEMENT OF THE EXPERIMENT 











Review Final test 
Group Kind Position in days Days after 
after original study original study 
1 Rereading 1 and 3 42 
2 Testing 1 and 3 42 
3 Rereading 8 and 15 42 
4 Testing 8 and 15 42 
5 Rereading 15 and 17 42 
6 Testing 15 and 17 42 














In an experiment of this kind there arises the problem of where to 
place the final criterion test. It may be given at the conclusion of the 


ax «>» —™ C5 &® ® 


—— 








Review, with Special Reference to Temporal Position 669 


review exercise, aS was done in one experiment; it may be given an 
equal number of days following each review; or, it may be given an 
equal number of days from the original learning. Under the last 
arrangement the time elapsing between the early reviews and the 
retention test is greater than that elapsing between the late reviews 
and the retention test. Any one of these arrangements is legitimate; 
and ultimately all of them should be tried out. 

Peterson, et al., met this problem by varying the temporal positions 
both of the reviews and of the retention test as follows: 








Ten-day | Twenty-day 
recall recall 
Review one day after learning..................... 20.4 20.1 
Review nine days after learning................... 19.9 20.3 











From the mean scores presented, one gains the impression that the 
time interval between reviewing and testing for retention is not impor- 
tant. In the present study the retention test was administered 
forty-two days after learning; thirty-nine days after the last review in 
the case of the earliest position studied; twenty-five days after review 
in the case of the latest temporal position. It was assumed that after 
a lapse of twenty-five days, a few additional days should not make 
much difference. 

Approximately thirteen hundred seventh-grade pupils in forty-eight 
classes in eleven public schools participated in this experiment. In 
each classroom the regular teacher was in charge of the experiment; 
explicit written instructions were furnished each teacher who partici- 
pated. As is noted above, the subjects were divided into six experi- 
mental groups. This was done by having the pupils count off by sixes. 
All pupils who counted ‘“‘one’’ became permanent members of Methods 
Group 1; all who counted ‘‘two” became permanent members of 
Methods Group 2, and so on. This procedure yielded six methods 
groups within each class, selected at random, comprised of about two 
hundred twenty-five subjects each. Moreover, it is assumed to have 
yielded groups which, for experimental purposes, may be regarded as 
equal. 

The six experimental methods, three positions and two kinds, were 
employed in each of the forty-eight intact classes. All subjects studied 
and read the learning material on the same day and all took the final 








670 The Journal of Educational Psychology 


criterion test of retention at the same time, forty-two days later. The 
subjects of Group 1 reread the material on the first and third days after 
learning; those in Group 2 took a thirty-item review test on each of 
these same days. The members of the various classes who were 
assigned to Groups 3 and 4 reviewed by rereading (3) or by taking a 
test (4) the eighth and fifteenth days; the various members who were 
assigned to Groups 5 and 6 reviewed by rereading (5) or by taking a 
test (6) on the fifteenth and seventeenth days. It is clear that only 
one-third of the pupils of a given class engaged in review on a given 
day, one-sixth by rereading and one-sixth by taking a test. Foil 
exercises were provided for the remaining two-thirds of each class on 
each day in which reviewing was done. 


RESULTS 


The results in terms of mean scores for the six methods groups, two 
kinds and three positions for each kind, are shown in Table II and, 
graphically, in Figure 1. 








is r 
¢ 

wr 
2 Se 
“ oe 

16 fF Testing review 

14 l l 

POSITION I POSITION II POSITION IIT 

(Days 1 ana 3) (Days 8 and 15) (Days 16 and 17) 


Figure 1. Effectiveness of two kinds of review at three tem- 
poral positions. 


Although all of the differences between methods means are small, 
certain of them are statistically significant, as determined by the 
method of analysis of variance. As is stated in connection with Table 
II, a difference of 1.39 in mean score is taken as a statistically signifi- 
cant difference (9% 9 probability). A comparatively small tendency 
is noted for rereading reviews to become more effective as the reviews 
are placed farther away from the original learning exercise. Inspection 
of the means of Groups 1, 3, and 5 bears out this statement. The 
differences among these three groups are not, however, significant. On 











Renew, with Special Reference to Temporal Position 671 


the other hand, there is a tendency for the effectiveness of testing 
reviews to decrease from the ‘“‘early”’ to the ‘‘late’’ positions. This 
statement is supported by comparison of the mean scores of Groups 2, 
4,and 6. ‘These differences between Groups 2 and 4, and 2 and 6, are 
significant. 


TaBLE II.—Metuops MEANS 








| 
Position Days after Kind of Mean score on test 
; original Group : 42 days after 
of review review nie 
study original study 
I 1 and 3 1 Rereading 16.00 
2 Testing 17.79 
II 8 and 15 3 Rereading 16.41 
4 Testing 15.68 
III 15 and 17 5 Rereading 16.92 
6 Testing 15.16 

















A difference of 1.39 in mean score represents 9% 99 probability of a true methods 
difference, by the method of analysis of variance. 


Moreover, a statistically significant advantage obtains in favor of 
testing reviews over rereading reviews when the review exercises are 
placed on the first and third days. A statistically significant advan- 
tage obtains in favor of rereading review over testing review when the 
exercises appear on the fifteenth and seventeenth days. If we are 
dealing with a true function, not a chance factor, we should expect the 
advantage of the testing review, obtained for the earliest position, to 
increase with further decrease in the time between learning and review- 
ing; and the advantage of rereading review, obtained for the inter- 
mediate and late positions, to increase with further increase of the time 
intervening between learning and reviewing. a 

It should be said in respect to the methods of experimentation 
utilized in this investigation and in those of Spitzer, of Peterson, et al., 
and of others, that the degree of learning actually attained is relatively 
small. The criterion test employed in this experiment consisted of 
forty four-response items. Assuming that all four of the possible 
responses to each item are equally plausible, our groups should have 
made a mean score of ten by chance. It is probable that larger differ- 
ences between the means of the methods groups would have been 
obtained had the amount of initial learning been greater. By taking 
the subjects in smaller groups and by using some kind of machine 





672 The Journal of Educational Psychology 


scoring, it should be possible to set up a condition in which each S is 
required to continue studying and taking tests alternately until he 
answers Some minimum number of items correctly; ninety-five per cent 
of them, for example. Under such circumstances the S’s would have 
much to forget and should be more sensitive to variations in review 
procedure. 

The scores of the six experimental groups on the retention test were 
analyzed by the use of the procedure known as ‘‘ analysis of variance.” 
The familiar large-sampling methods for determining the standard 
error of a difference between mean scores are regarded as appropriate 
only when the samples used are strictly random samples of a given 
population. In the present investigation the six groups were not 
random samples of seventh-grade pupils, but rather they were random 
samples of pupils within forty-eight classes in eleven schools. The 
analysis of variance affords an appropriate technique for the attaining 
of an estimate of the reliability of the methods means. The investiga- 
tion was treated as consisting of forty-eight separate experiments, all 
conducted identically in as many classes. 

The present experiment and the experiments cited above lead to the 
conclusion that the effectiveness of rereading reviews is not influenced 
appreciably by the temporal position at which they are introduced, but 
that the effectiveness of a testing review varies inversely with the 
remoteness of the review exercise from the position of original learning. 
These findings suggest that review in the form of recall and review in 
the form of relearning (rereading, etc.) are not similarly affected by the 
varying of temporal positions. The effectiveness of the latter is not 
subject to the temporal position at which it is given. The former is 
most effective when given relatively soon after learning, but soon loses 
its value. 

Incidentally, the relative effectiveness of the two kinds of review 
is of some practical import in pedagogy. The investigation reported 
in this article was so designed as to afford a comparison between the 
two methods of reviewing. Two or three items of interest may be 
noted. First, it is significant that an objective test serves an important 
function both when used as review and as an element in a learning 
exercise. Miss Luce (1939) found that twenty minutes devoted to 
reading and study and ten minutes to the taking of a multiple-choice 
test yielded superior results to thirty minutes devoted to reading and 
study. The work of Spitzer and that reported by the present authors 
shows that a multiple-choice test performs important service as review. 





— ~~ — es FH — =), -~eR FO - SF HR 4. & FX MDM TK) 


wa aa —-e 4 oe ~~ rh an _ — 


rr C —" —_—" iad © ~~ 








Review, with Special Reference to Temporal Position 673 


Second, in the present investigation, the testing review proved to be 
superior to the rereading review when the two kinds of review were 
placed on the first and third days. Third, the reverse condition 
obtained when the reviews were placed on the fifteenth and seventeenth 
days. The differences in both instances are statistically significant. 
An advantage was found in favor of rereading review as opposed to 
testing review when the two kinds appeared on the eighth and fifteenth 
days, but the difference is lacking in statistical significance. The fact 
that recall is an important factor in learning and an important method 
of reviewing could have been predicted from the well-known work of 
Gates (1917). However, the fact that an objective test serves this 
function and that its effectiveness diminishes as the time between 
learning and review increases could not have been predicted. 

When rereading is the medium of review, according to the data 
reported above there is no justification for the claim that reviewing 
should be indulged in relatively soon after learning or that initial 
reviews should be comparatively frequent. When objective tests are 
used for review purposes, there is a significant advantage in favor of 
early reviews. It seems likely that similar results would be obtained 
for other kinds of review that make use of recall. 

What is the bearing of these findings upon the theoretical issues 
raised above? The finding that an objective test, or any kind of recall 
exercise, serves a more important review function when placed early, 
and that its importance diminishes with the passage of time, probably 
signifies that the more one has forgotten the less valuable a recall 
exercise is. It does not argue that the exercising of strong or new 
associations is more efficacious in general than similar exercising of 
weak or old ones. In the case of recall it is often a question of exercis- 
ing an association early or not exercising it at all. Only the finding 
that rereading and other relearning types of review are more effective 
when placed early than late would seem to prove the general case. 

Jost’s (first) law is frequently cited in explanation of the empirical 
advantage of spaced practice as opposed to massed practice. Should 
it be granted that the advantage of spaced practice is an instance in the 
law, one could argue, logically, that reviews relatively remote from 
learning should be more effective than those placed at an immediate 
temporal position. However, it seems clear, upon examination of 
Jost’s statement, that the law does not fit precisely the issue involved 
either in the ‘-mporal position of review or in distributed practice in 
original learmug. 








674 The Journal of Educational Psychology 


Jost’s two laws are as follows, in somewhat free translation: 


(I) If two associations are of equal strength, but different ages, then a new 
repetition has greater value for the older. 

(II) If two associations are of equal strength, but different ages, the older 
decreases less in time. 


It is the first of these two laws that is generally quoted in connection 
with spaced practice. However, it is not directly implicated in dis- 
tributed practice, as, for example, one repetition a day for five days as 
opposed to five consecutive repetitions, for the reason that, in such a 
situation, the strength of an association decreases as its age increases. 
This is analogous to the situation encountered in review. 

However, the established facts relative to the value of distributed 
practice may, conceivably, justify an extension of Jost’s first law. 
Even so, such facts should be construed not as an instance of (or as 
explained by) Jost’s law, but as justifying an extension of it. Should 
it be granted that spaced practice meets the conditions of Jost’s first 
law, the findings with respect thereto should be regarded as a validation 
of the law, not as explainable by it. 

The operation of the first of these laws has been explained as the 
result of perseveration; it is this explanation that has been invoked 
to account for the value of spaced practice. A kind of maturing or 
setting-in process is supposed to follow each experience, repetition, or 
neurophysiological event. As some authors have suggested, if prac- 
tices are massed, there ensues but one perseverative process for the 
entire group, whereas, if a certain amount of time intervenes between 
the several practices, a perseverative process may be realized for each. 
Perseveration implies that some kind or degree of learning takes place 
automatically following active practice. Any increase in perseveration 
is said by its advocates to bring greater returns in the way of learning. 
An arrangement that provides for five perseverative processes in five 
trials or repetitions would naturally be superior to an arrangement that 
provides for but one perseverative process in five trials, if perseveration 
is a genuine phenomenon. 

There appears to be no evidence for perseveration beyond the facts 
that it is invoked to explain. The theory was advanced by Mueller 
and Pilzecker as an explanation of the facts of retroactive inhibition, 
and for many years stood as the principal explanation of the phe- 
nomenon. McGeoch (1933) has shown that the implications of the 
theory are contradicted by empirical data; and Nagge (1935) and 





Pa df oo. Se ee 


rr ne _~ — ted ~~, — 








Review, with Special Reference to Temporal Position 675 


others have adduced a respectable body of evidence in support of 
associative interference, a rival explanation of retroactive inhibition. 
Similarly, it appears that perseveration is in no wise essential to the 
operation of Jost’s first law. 

It follows from the negatively accelerated character of the Ebbing- 
haus curve that if two associations are of equal strength, but of differ- 
ent age, they must have been learned unequally, other conditions 
remaining constant. Moreover, Jost’s second law is logically deduci- 
ble from the Ebbinghaus curve and does not, therefore, require empiri- 
cal proof, in all conditions to which the curve applies. This statement 
assumes that forgetting of the individual items in a list follows the 
same general course as that obtained empirically for the whole list. 

Jost’s first law may be broken down into two cases and two theo- 
rems, as follows: 

Case 1.—If two sets of material learned to the same criterion, but 
on different occasions, possess the same recall value at a given time 
afterwards, the set learned first is less readily forgotten. Theorem 1. 
In virtue of its greater retention value it will benefit more from repeti- 
tion than the one learned second. 

Case 2.—If two sets of similar material, learned to different criteria 
on different occasions, are found to possess the same recall value at a 
given time afterwards, the one learned first was learned to a higher 
criterion than the one learned second. Theorem 2. It will benefit 
more from repetition than the one learned second. Case 2 is deducible 
from the Ebbinghaus curve of forgetting; theorem 2, from Jost’s 
empirical data. 

Finally, it should follow that, since testing reviews are more effec- 
tive when placed at early positions, the effectiveness of such reviews 
should vary directly with the degree to which the material is originally 
learned. Moreover, for material that is well learned, recall, as in the 
form of a test, should be more productive than relearning, as in reread- 
ing. There should be a degree of learning at which the two forms are 
equal and below which rereading becomes more productive than recall. 


REFERENCES 


Bridge, M.: Effect on Retention of Different Methods of Revision. Australian Coun- 
cil for Educational Research, Melbourne University Press in association with 
Oxford University Press, 1934. 

Ebbinghaus, H.: Memory (translation by Ruger). Teachers College, Columbia 
University, 1913. 





676 The Journal of Educational Psychology 


Gates, A. I.: “Recitation as a factor in memorizing.’’ Arch. Psychol., Vol. VI, 
No. 40, 1917. 

Jordan, A. M.: Educational Psychology. Henry Holt, 1933, p. 185. 

Luce, Edna: The Effect of Pretests and Posttests upon Retention. Unpublished 
Master’s Thesis, Univ. Iowa, 1939. 

Lyon, D. O.: ‘‘The relation of length of material to time taken for learning, and 
the optimum distribution of time: III.” J. educ. Psychol., Vol. v, 1914, pp. 
155-163. 

McGeoch, J. A.: ‘‘Studies in retroactive inhibition: II. Relationships between 
temporal point of interpolation, length of interval, and amount of retroactive 
inhibition.” J. gen. Psychol., 1933. 

Nagge, J. W.: An experimental test of the theory of associative interference.”’ 
J. exp. Psychol., vol. xv111, 1935, pp. 663-682. 

Peterson, H. A., et al.: ‘Some measurements of the effects of reviews.”’ J. educ. 
Psychol., Vol. xxv1, 1935, pp. 65-72. 

Spitzer, H.: “Studies in retention.” J. educ. Psychol., Vol. xxx, 1939, pp. 641-656. 

Starch, D.: Educational Psychology. Macmillan, 1920, p. 158. 


Tsai, Loh-seng: ‘‘The relation of retention to the distribution of relearning.” J. 


exp. Psychol., Vol. x, 1927, pp. 30-39. 





ZF, 


~ we ana _ 








SOME PITFALLS IN THE STATISTICAL ANALYSIS OF 
DATA EXPRESSED IN THE FORM OF IQ SCORES 


ROBERT W. B. JACKSON 


Department of Educational Research, Ontario College of Education, University of 
Toronto 


The scores obtained on mental tests are very often, if not generally, 
expressed in the form of intelligence quotients. The tests are scaled 
and standardized in such a way that the scores may be expressed in 
units of what has been termed ‘“‘ mental age,”’ 2.e. in years and months. 
The IQ score for any particular individual is obtained by dividing his 
mental age, as determined by his score on the mental test, by his 
chronological age and multiplying this ratio by 100. If we denote by 
z; the mental age and by y; the chronological age of the 7-th individual, 
then the IQ of the individual is defined by 


IQ = % == X 100 (1) 


It will be noted that if the mental age and chronological age are 
equal, then the IQ equals 100. The mental tests are generally so 
arranged that an IQ of 100 is the expected score of an individual in a 
particular chronological age-group. ‘Thus, we find, in practice, that 
the IQ scores vary about 100 and certain levels may be fixed defining 
mental deficiency, for example, or exceptional ability. 

For this purpose the IQ scores are very useful, or rather convenient, 
but it would seem that one important point has frequently been over- 
looked or forgotten. The z, or IQ, scores are disguised indices and, 
therefore, particular care must be taken in the statistical analysis of 
data expressed in this form. Karl Pearson pointed out some of the 
difficulties and pitfalls in work of this kind as early as 1897,! and 
Thomson and Pintner discussed the educational aspects of the problem 
fifteen years ago,” but the results do not appear to be widely known or 
fully appreciated. 

Some of the difficulties involved are discussed in this paper, and it is 
hoped that the discussion and examples given will make it clear that 





1“On a Form of Spurious Correlation which may arise when Indices are used 
in the Measurement of Organs.” Proc. Roy. Soc., Vol. ux, p. 489. 
* Thomson, G. H. and Pintner R.: “Spurious Correlation and Relationship 
between Tests.” Journal of Educ. Psych., Vol. xv, Oct., 1924, pp. 433-444. 
677 








678 The Journal of Educational Psychology 


great care must be taken in the interpretation of the statistical results 
of such analyses. In many cases the results are meaningless, in other 
cases definitely misleading, but this is not necessarily true of all such 
cases. This is one of the unfortunate features as each case must be 
considered separately ; the procedure which is valid in one case may be 
invalid in another apparently similar one. However, if all the factors 
involved in any particular case are carefully examined, and especially 
their relations to each of the other factors, it should always be possible 
to determine what interpretation, or interpretations, may be placed 
on the results. 
Denote by: 

N = the number of observations in the sample 
= summation for all N values of 7 
mean IQ score 
mean mental age 
mean chronological age 
standard deviation of the distribution of mental ages 
standard deviation of the distribution of chronological ages 
standard deviation of the distribution of 1Q scores 
correlation between mental age and chronological age 


M.=> >% 
M, => (2) 
M, = > Yi 
NS = >) 22 —F20" 
NS? = > we? _ Gu (3) 
N82 = > 22 - ex) 


(22;)(Zy;) 
LiYi — 


z 
M 
M, 
M, 

S 
S 
S 


Try 


then 

















Ss . ee" > ye? — y)" 








Some Pitfalls in the Statistical Analysis of Data 679 





Denote by: 
XxX; = Yi ~—— 
iano me 6) 
V,.= 
i (6) 
V, = a 


Using the value of 2; = 100 from equation (1) we have 


M.=-=s zi 


_ 100 


Using (5), we may write 


vi = M. X; Y; 

fo ae) + ae) (1+ a) ® 
If the values of (Y;/M,)? < 1, and of magnitude such that we may 
ignore powers of Y;/M, higher than the second, we may expand 





~\~!l 
(1 + m) in a binomial series and write 
y 


zx M,z Xi f Y; Y; : 
fm (t+ ae) tt ae + Ge) f - 


If the values of X;/M, are of the same order of magnitude, we find that 











Yo) ag th — taVeVs + V0 (10) 
and 
M, = OF (1 — taV ey + Vi} (11) 
Under the same conditions, we find 
s, = VE = taVV, + Ve (12) 


CORRELATION OF IQ SCORES 


In certain cases, two mental tests, or two forms of the same mental 
test, are given to the same group of students. The scores of the indi- 








680 The Journal of Educational Psychology 


viduals on the tests are expressed in the form of IQ’s, and the relation- 
ship between the two sets of measurements obtained by correlating the 
IQ scores. Actually, we should distinguish between two possible 
procedures: 

(1) the two tests are given at approximately the same time, and 

(2) the second test is given after a certain length of time has 
elapsed, say k months after the first test. 

From the statistical point of view, the two methods will not give 
comparable results: The second method introduces a constant change 
(k months) in the chronological ages of all the pupils but the mental 
ages of the individuals will not necessarily be changed in the same 
manner. For this reason, therefore, we will consider only the first case 
in the following discussion. 

Denote by: 
21; = the mental age of the 7-th individual as determined by his 
score on the first test; 
Ze; = the mental age of the 7-th individual as determined by his 
score on the second test; 
y; = the chronological age of the 7-th individual; 
21:i, 22: = the 1Q of the 7-th individual as determined by his score on 
the first and second test, respectively; 
’z,2, = the correlation between the mental age scores on the first 
and second tests; 
Tz,y = the correlation between the mental age on the first test 
and chronological age; 
Tz,y = the correlation between the mental age on the second test 
and chronological age; 
M,, = mean mental age on first test; 
M,, = mean mental age on second test; 
S:, = standard deviation of the distribution of mental age on the 
first test; 
S., = standard deviation of the distribution of mental age on 
the second test. 








_ Ba 

V., = ©. 
_ Ba 

V2, = ©. (13) 
Sy 

Vy = yf, 


ee Ee. EE ne EE ane ne eel 


Y 
5 








Some Pitfalls in the Statistical Analysis of Data 681 


We may define the correlation, r,,.,, between the two sets of IQ 


scores in the form 
> (2215) (2224) 
21128 N 


S--S-5} 


This may be reduced, following the method used above, to the form 


Vy? — fayV2,Vy — Toy V Vy + T2:2:V V0 








Tey22 = 





(14) 














sn = 15 
. Vi V.,? ov 2rziyVeiVy + V,7} { V2,’ ea 27 sy Ve.Vy + V,7} ' 
If we assume that 
Va ” Ve, _ Vy (16) 
equation (15) reduces to 
Tan = 1 1 Fe Oo Ts + Veyz2 (17) 


~ 24/0 — tea) — Tew) 


The value of the correlation coefficient r,,., will be determined by 
the relative magnitudes of the three intercorrelations rz, Tz, and 


Tz,2, These three correlations are not independent as they must 
satisfy the inequality 


Taz," + Tay” + Tey” ioe 272,21 297 ay <i (18) 


but, even so, our choice of values is sufficiently great; it is clear from 
equations (15) and (17) that very strange results may be obtained. It 
will be noticed that if the three variables z;, zz. and y are independent 
then rz.2, = T2y = ‘zy = 0 but r,,., will not be zero, in fact it will be of 
the order of 0.5 if the conditions of (16) are satisfied. This value of the 
correlation coefficient has been termed “spurious” correlation by 
Pearson. 

This extreme case may not occur in practice, but it is dangerous to 
attempt to interpret the results unless all the separate factors, and their 
inter-relationships, have been studied. 


CORRELATION OF IQ SCORES AND CHRONOLOGICAL AGE 


The results are found to be even more peculiar when we examine 
the relationship between the IQ scores and chronological age of a 





682 The Journal of Educational Psychology 


group. Using the values defined above, we have 


> yi — Es 2) 


ISS e-F] 


_ fua¥ o Dara ve 
V V2? — 2ryVeVy + Vy? 


If we assume that V, = V,, then 





Ty = 








(19) 


which reduces to 








Vey 


(20) 


fea — 1 


~ a/8(1 — Fa) 
— - fat (21) 


But —1 Zr, Z + 1, so from equation (21) it follows that 








Tey 








3 Zry £0 (22) 


The correlation between IQ scores and chronological age, therefore’ 
will always be negative—if our conditions are satisfied—unless there 
exists a perfect positive relationship between the mental and the 
chronological ages, in which case it will be zero. Or, from equation 
(20), ry Z Ounless7r,, S V,/V.; since V,and V, will seldom be exactly 
equal, it will be better to use the latter relationship. 

The result, therefore, of correlating 1Q scores and chronological age 
will be a negative coefficient, and this will be meaningless unless we 
interpret it in terms of the relationship between mental and chronologi- 
cal ages. The usual interpretation of the correlation coefficient may 
even be misleading if applied in cases such as this one. 


ELIMINATING THE EFFECT OF CHRONOLOGICAL AGE 


It is interesting to see what happens when we eliminate the effect 
of chronological age, by the partial correlation technique, from the 
correlation between two sets of IQ scores. Using the above notation, 
the formula may be written 


Vayzq — Tay zw 








(23) 


Vaz = 9 
4 Vv {1 = es {1 — ry} 





Cé 


nae 


a, ~~ rh 8 








Some Pitfalls in the Statistical Analysis of Data 683 


where 1z,2,.y denotes the correlation between IQ scores with chronologi- 
cal age held constant. 


Using the results given in equations (17) and (21), assuming 
V., = Vz, = Vy, equation (23) reduces to 


Vayz2 —~ Tay ew 


dll V {1 — rey} {1 — rey} (24) 


and for the general case, without the assumption that V., = V., = Vy 
we find after substituting the values from (15) and (20) in (23), 








Taiz. — Try oy 


Tazey = Jil oes roy} (J oe row} (25) 


In both cases, therefore, we find that 








Veyz2-y _ Tayzyy (26) 


where fz,2,.y is the correlation between the mental age scores with 
chronological age held constant. 

The net result of all the calculations, therefore, would be a return 
to the statistical analysis of the original mental age scores—certainly a 
tedious and roundabout method of achieving such a result! 

It should be noted that V.z,, Vz,, Vy, x,,/Mz,, 2,,/M:z,, y/My, are 
assumed to be so small that we may ignore powers higher than the 
second. These are the fundamental assumptions underlying the above 
discussion and they may not be exactly satisfied in practice. The 
theoretical and actual results, therefore, may not always agree. 


Examples. 
(1) N = 20 
T2,2, = —0.037 foe, = 0.632 
Try = 0.159 ry = —0.6906 
Try = —0.150 Try = —0.8532 
s,2y = —0.013 Cosy = O114 
(2) N = 20 
2,2, = 0.559 Tos, = 0.007 
Try = 0.565 Try = —0.2735 
Texy = 0.975 Try = 0.1603 
Tryzyy = 0.044 Tey = 0.054 


These examples are fictitious and there are only twenty cases in 
each, but they illustrate clearly what may happen in practice. In the 





684 The Journal of Educational Psychology 


first case we start with unrelated variables, the correlations relating to 
mental age are not significantly different from zero, and obtain high, 
but spurious, correlations between the IQ scores and between the IQ 
scores and chronological age. If we correct for chronological age, i.e., 
partial out the effect of chronological age, we obtain zero order 
correlations. 

In the second case we start with high mental age, and find zero 
order [Q correlations. Eliminating the effect of chronological age from 
both rz,2, and 7,,2, we obtain zero order correlations. These examples 
indicate that very strange results may be obtained—which may be 
meaningless—unless we consider carefully all the inter-relations 
between the variables. 


Example 3. Fraternal Twins, Unlike-sex Pairs—From Appendix B, 
pages 121-123, Wingfield, Alex. H.: Twins and Orphans, the Inheritance of 
Intelligence. J. M. Dent & Sons, Ltd., 1928. 


232, > 0.396 Ts:2, = 0.262 
Try = 0.513 Tsy = —0.334 


It will be noticed that there is a marked negative correlation 
between the IQ scores and chronological age, —0.334. Wingfield con- 
cludes (page 80) ‘“‘It can be seen that there is a gradual decrease in 
intelligence with age,”’ but this is not the correct interpretation. If we 
substitute the value rz, = 0.513 in equation (21) we find that r,, should 
be about —0.36; the negative correlation between the IQ scores and 
chronological age simply means that there is a positive correlation 
between mental age and chronological age. The negative correlation 
is due to the fact that the 1Q scores are indices, and is meaningless. 


Example 4.—From Growth Data for Use in Educational Statistics. Depart- 
ment of Educational Research, Ontario College of Education, University of 
Toronto, Table IIB; Mental and Physical Measurements of eighty-five girls. 


T2,2, = 0.877 Tez, = 0.860 
Ty = 0.242 Try = —0.198 
Tey = 0.294 Tey = — 0.074 


Tejzy-4 = 0.869 Teyeey = 0.864 
Ve, = 0.120; Ve,= 0.149; Vy = 0.061 


This case is peculiar in that: 


(1) the correlations between mental ages and between IQ’s are 
approximately equal; 





<a 


— . 2 FHM oe ff &© fF, 








Some Pitfalls in the Statistical Analysis of Data 685 


(2) the values of the V’s are not the same, Vy, is particularly small; 
(3) there is practically no relationship between mental age, or IQ 
scores, and chronological age. 


It does not matter here whether we use mental ages or intelligence 
quotients, but the other examples show that this is not always true. 
We conclude, therefore, that every case must be considered on its own 
merits, and that in all cases we must study all possible relationships 
between the variables. 

There are two other measures in common use, the educational 
quotient (denoted by EQ) and the accomplishment quotient (denoted 


by AQ), which are also disguised indices. These are defined by the 
equations 


pq = FA x 100 


AQ = se X 100 


where EA denotes educational age; MA denotes mental age, and CA 
denotes chronological age. The mental age is determin :d (as explained 
above) from the score on a standardized intelligence test and the 
educational age is determined in a similar manner from the score on a 
standardized educational test. It is clear that the results deduced 
above the IQ’s apply equally well to the EQ’s and AQ’s, or to any other 
such index. Formulas showing the relationship between the various 
factors for these quotients may be obtained from those given by a 
slight change of notation. It is sufficient for our purpose to point out 
that the position here is exactly the same and that the precautions 
which must be taken when analyzing scores expressed as IQ’s must also 
be taken when we are dealing with EQ’s or AQ’s. 





THE MEASUREMENT OF MENTAL GROWTH 


JOHN P. HERRING 


Seattle, Washington 


A good method of measuring mental growth and decline from the 
beginning of life through old age is not yet in general use. The chief 
difficulty lies in the concept of intelligence quotient itself, MA/CA. 
This concept breeds confusion at all periods of life, confusion which, if 
it be less during childhood, is even then great. But since quotients 
themselves are sound and likely to survive, it would be well to have 
consistent ones. 

There is a suitable, basic concept; namely, the rate of change of 
mental level. The ratio of the individual’s rate to the rate which is 
average for his group may be called mental quotient. This group may 
usually be his age group, but it may also sometimes usefully be his 
occupational, family, or other sociological group. 

Let us use these symbols: 


birth 

first 

second 

individual 

average-for-age 

mental level in a scale having equal units 
Q mental quotient 

gain 

time 


~ © wh & °° noe Oo 


The equation, IQ = MA/CA, may be regarded as an elipsis for 


MA2— MAb _g 
CA2 — CAb’ 7 expressing an average rate of change of mental 





age from birth to the time of testing, whether growth during the 


interval is uniform or not. The rate of change at the time of testing 


may be found by as — aa or / in which ‘‘t”’ is, for example, one 





year from eleven to twelve. The ratio of an individual’s rate to the 
gi/t (MA2 — MA1)i Usi 
ga/t’ °° (MA2— MAl)a ~* "8 


L, instead of MA, and calling the ratio MQ, MQ = iB ae The 


rate which is average for his age is 








L’s would be found by consulting a one-page table of average-for-age 
L’s, or, if the score is L, by using the score. 
686 





pel 
agi 


no’ 


pri 


are 


ze) 
gr 


tir 


te: 


an 
ra 








The Measurement of Mental Growth 687 


Now the equation, IQ = MA/CA, has certain advantages: 

(1) It has been useful in the hands of a generation of competent 
persons. (This statement and others have been argued over and 
again, and will not be argued here.) 

(2) It has bequeathed to us a number of constants-with-meanings, 
now integrated into our culture, where they are a tradition of great 
practical value. 

(3) It has ready interpretations for the period of childhood. 

(4) It is simple to compute. Its suppressed terms, MAb and CAb, 
are not measured, are conveniently taken as zero, and are treated as 
having no variation. ‘Time is thus saved. 

But the equation IQ = MA/CA has also disadvantages: 

(1) Its utilization of MA is weaker than need be, unless absolute 
zero is used, unless its units are equal each to each, and unless the 
growth of intellect averages as if uniform. 

(2) It does not give the rate of change of mental level at testing- 
time, unless the average rate of change from birth to test should, after 
further study, turn out to be the same as if change is uniform. 

(3) While it does give an average rate of growth from birth to 
testing-time, except for whatever error may be due to mal-use of zero 
and to inequality of units, it may not give even an approximately true 
rate for shorter intervals. 

If intellect grows uniformly from birth through childhood, then 
an average child may have an MA/CA of 100 from birth through 
childhood. 

If intellect grows nearly uniformly from birth through childhood, 
then the average child’s constant IQ of 100 need not be very confusing 
during that period. 

If intellect actually grows with negative acceleration, then, no 
matter how it is assumed to grow, an average child may actually have 
an MA/CA of 270 at age one, 240 at age four, 170 at age eight, and 
120 at age twelve. Such a child would now usually be assigned an IQ 
of 100. While the Binet IQ was so adjusted that 100 arbitrarily 
became the average for each age group, the MQ, by contrast, is 
designed to disclose rates of mental growth, whatever they may in 
fact be. 

If intellect grows slowly at first, then faster, and then uniformly no 
matter how it is assumed to grow, then an average child may have an 
MA/CA of 25 at age one, 50 at age three, 85 at age eight, never reach- 
ing 100, and never remaining long the same. 





688 The Journal of Educational Psychology 


(4) After childhood, it gives no readily interpretable, no soundly 
interpretable mental ages or intelligence quotients. During adoles- 
cence, the meanings of MA and IQ gradually glide from their childhood 
meanings toward whatever-it-is they come to mean during adulthood. 
And during adulthood, they are, at the very best, notoriously perplex- 
ing; and no matter what the correlation between the sigmas of IQ’s and 
the sigmas of standard measures, they remain notoriously perplexing. 

(5) After childhood, it does not validly predict mental ages, rates 
of change, or intelligence quotients; and it may not be doing so during 
childhood or at any other time. 

(6) Its suppressed terms, MAb and CAb, may at first be thought 
free from error and from variation, whereas they may be, both of them, 
actually fallible and variable. Moreover, they are confused by the 
problem of beginning, whether to take them as of birth or as of con- 
ception—so that they may carry both constant and variable errors. 
If they were to be taken as of the time of conception, the assumptions 
would be involved that the cell resulting from union of sperm and ovum 
has in its first moment zero intelligence, or, if it has not, that the 
amount of its intelligence is insignificant, and that the variation of the 
intelligence present in such cells is also insignificant. Now it is, of 
course, unsafe to make these assumptions, because they are highly 


speculative, but they may be better than those needed for MA/CA. 


Now the equation, MQ = 5 = ae has certain advantages: 





(1) It is the logical, mathematical equation for comparing the rate 
of change, or slope, of mental level of the individual, with the rate, or 





ee , t 
slope, which is average for his age-group. Itis gt/ » the numerator and 


ga/t 
the denominator of which are somewhat analogous to dy/dz of the 
calculus. 

(2) It provides just what is needed, and what the IQ was sometimes 
supposed to provide but could not: First, individual compared with 
average rate of growth at any time; and, second, some prediction about 
intellect, obtainable at any time and identifying important characteris- 
tics of a person’s life-long intellectual trend. The constant IQ was an 
attractive idea and it was simple; but life itself is not so simple, and 
the IQ is not a constant throughout life. Intellect does not grow 
uniformly from birth to death, and it may not do so at any time. The 
prediction takes the form of the three life graphs—level, rate, and 
quotient—which provide information analogous to that which the 





col 


gré 
of 


est 
mé 
int 
su] 


en 


su 


su 
ar 
CO 
to 








The Measurement of Mental Growth 689 


constant IQ was sometimes thought to provide but did not. These 
graphs present interpretable levels, rates, and quotients for the whole 
of life. 

(3) From records obtainable at any period of life it can be used to 
estimate levels, rates, and quotients for all periods. L1, L2,L3,.. . 
may be plotted upon a page bearing the family of human curves of 
intellect for the whole of the life-span. Thus information would be 
supplied, tentatively, with cumulative probability. 

(4) It avoids certain false childhood predictions of adolescent and 
adult IQ’s and certain false adolescent and adult estimates of childhood 
IQ’s. 

(5) It may contribute to the solution of problems—constancy of 
the IQ, laws of mental growth, significance of rates of growth common 
to identical and non-identical twins, siblings, families, races, sexes, and 
occupational and other sociological groups, influence of environments 
upon persons, groups, and cultures. 

(6) It can be used for persons whose age is not known. It does not 
have CA as aterm, suppressed or not. The work of verifying CA, and 
the unreliability due to its use, are avoided. It has no suppressed 
terms, but is overt. For the individual whose age is not known, there 
can be found two L’s and one g/t—data sufficient to identify tentatively 
his L-graph and, therefore, his rate-graph and his MQ-graph, which are 
nothing but functions of L. Then his MQ’s and ages can be read from 
his graphs. Such findings can be improved by testing at intervals and 
it can be determined how much confidence to repose in them. It is 
conceivable that individuals exist whose age and MQ can thus be 
usefully intimated. 


(7) It is free from error resulting from the misuse of a zero. 

But the equation, MQ = i on oe has certain disadvantages: 

(1) It depends upon two differences and a ratio. While the differ- 
ence in the denominator can be made as reliable as it is worth while to 
make it, that in the numerator, and the ratio, make for unreliability. 

But MA/CA is only apparently more simple. On account of its 
suppressed terms, its actual reliability may be partly spurious; 7.e., it 
may have no right to be as reliable as, in fact, it becomes when it 
suppresses one half of its variables. It, too, really has two differences 
and aratio. The disturbing variation of the suppressed terms has, of 
course, not disappeared as a problem just because the terms are taken 
to be zero when they are not, and to be constants when they are varia- 











690 The Journal of Educational Psychology 


bles. MA/CA is, however, probably more reliable than it would be if 
it were not for the great length of time between birth and testing. It 
may be that the best way to obtain g/t and MQ for the individual is to 
identify his L-graph cumulatively and to read from it, and from the 
other graphs, his g/t’s and MQ’s, and that these quantities, so deter- 
mined, will evince cumulative reliability and be reliable enough at an 
early stage. 

(2) The equation comes to 0/0 at some time during adulthood, 
passes through that value, and possibly remains in the neighborhood 
for years. It would, of course, be nonsense to call this ratio indeter- 
minate, and common sense to call it zero when it is reliable and when 
it is obviously a zero in series. 

(3) It requires at first two measurements instead of one and, there- 
fore, a delay before MQ can be found by the equation. But it is only 
the first MQ that requires delay; for the second MQ can utilize L2 of 
the first MQ as its L1—alternate MQ’s having no common terms, and 
being, in that respect, independent of each other. Institutions which 
now use cumulative records of mental level, whether or not in the form 
of mental ages, may already have enough data for the immediate 
computation of a number of consecutive MQ’s for each person meas- 
ured. Such institutions can, therefore, at once, subject their accumu- 
lated data to a new analysis. When MA’s are used in the equation, 
in the place of L’s, and the interval between measurements is short, the 
resulting MQ’s are likely to correlate highly throughout life with the 
MQ’s obtained by using L—in part because of a certain automatic 
correction, later described, for those inequalities, if any, which exist 
in mental age units. But note that the very first Z found for an 
individual whose age is known identifies tentatively his whole life 
graph of mental level, and therefore of rate and of MQ. The moment 
one L is obtained, an MQ can thus be read from his graphs. One L 
begins to identify a person’s three life graphs—of level, rate, and 
quotient; successive L’s tend to improve the identification; and the 
degree of confidence to be placed in the identification can be fixed. 
Cumulative records may so closely indicate the individual’s graph of 
mental level among those of the family of graphs, that his g/t’s and 
MQ’s read from his graphs will be reliable enough. 

(4) It requires test construction, and experimentation, before 


use. 
(5) It would have to overcome the vested interests of money, time, 
convenience, and inertia. 








The Measurement of Mental Growth 691 


This finishes the consideration of advantages and disadvantages. 

MQ has two requisites which it is appropriate to mention. 

(1) MQ requires the future construction of a new test of intelligence 
having units which are equal in a defined sense. An L-scale might be 
constructed, after the manner of Kelley, out of units sensed by com- 
petent judges, to be equal, or it might derive the equality of its units 
from CAVD of Thorndike—even if there is doubt about equality in 
widely separated portions of the L-scale. If each examinee works only 
on problems at or near his own level, MQ may turn out to be more 
surely comparable from one end of the scale to the other than either 
sensed units or CAV D units themselves; for if the units are too long by 
any unknown amount in the numerator in one part of the scale, then 
they are also by approximately the same unknown amount too long in 
the denominator; and a correction, in the right direction, and of 
approximately the right amount, may result. 

(2) MQ requires that the two measurements of L be obtained, as 
nearly as feasible, within a period of uniform or nearly uniform change 
—a requirement usually permitted application. During childhood and 
adulthood, if change is somewhat uniform, there will be little problem 
in this respect with most individuals; while during infancy, adolescence, 
and senility, periods during which it is now not unreasonable to suspect 
the presence of positively or negatively accelerated change, its use may 
require a compromised interval—short enough to avoid much change 
in the rate of change but at the same time long enough to assure 
reliable difference and ratio. But, of course, an average slope of the 
curve can be found for any period, no matter what the acceleration. 
It may be that an interval of six months should be tried and that it 
might prove useful in all periods of life, or that the interval should be 
different for different periods; but it is, at any rate, a matter for future 
experiments. If a good test is used, the interval may be short. If the 
formula prove unreliable at certain periods, the difficulty may perhaps 
be overcome by lengthening intervals and cumulating records. But 
the interval should probably not be long enough to include widely 
different rates of change—exactly what MA/CA may now usually do, 
since the interval is from birth to measuring time, and no law of 
growth is yet generally accepted. 

The growth and decline of intellect can be treated by the methods 
of curve-fitting, of curve analysis, and of the calculus. The rates and 
directions of change of mental level can be represented by derivatives; 
maxima and minima and points of flexion can be found. The 





692 The Journal of Educational Psychology 


characteristics of graphs of L, g/t, and MQ can thus be identified by 
means of the calculus. A family of graphs, or of statistics, can be 
printed upon a single page, one for each person, as the background for 
his entries which, made at any period, tend to identify for all periods, 
and with cumulative accuracy, his changing mental levels, changing 
mental rates, and changing mental quotients. 





~~ —_— = =e 18 >TRH TH 


on i a ee ee) 


“Te — —= 


/ cs- ae ae. | 








THE PROBLEM OF PSEUDO-FEEBLEMINDEDNESS: 
A REPLY 


GEORGE 8. SPEER 
Child Guidance Service, Springfield, Illinois 


There recently appeared in this Journal! a paper which presented a 
startling innovation in the measurement of intelligence. Briefly, the 
suggested procedure is this: The Stanford-Binet examination is 
administered in the normal fashion. For the purpose of obtaining a 
mental age and an IQ, however, ‘‘the mental age and IQ of upper 
mental age attainment of the Binet was ascertained by adding the 
number of mental age months earned at the upper limit to a newly 
assumed basal taken at the year immediately below it”’ (', p. 523). 

This procedure is justified by two assumptions: (1) Retardation in 
language development reduces the Stanford-Binet level of intelligence. 
(2) “ . . . an individual could pass all subtests below his highest level 
of attainment, as expressed on the Binet, were it not for the lowering 
effect of non-intellectual factors.”’ 

As the procedure suggested involves so radical a change in the 
accepted IQ, it was felt that an examination of the method and its 
results would be advisable. 

Bijou has not presented any data to support his contention that 
his procedure corrects for a lack of language development of inferior 
mental control. He does present data which show that some subjects 
gain and that others do not. Those who gain are then defined as 
““pseudo-feebleminded.”’ | 

In preparing another study, we have tabulated the Binet perform- 
ance of six hundred ten subjects. From this data we have taken the 
records of all (one hundred eight) children whose 1Q’s are below 70. 
We obtained the suggested highest Binet attainment and calculated 
the mean gain in IQ. Gain was plotted against original Binet IQ. 
The data are presented in Table I. It is apparent that the higher the 
original IQ, the greater the gain by “‘highest Binet attainment.” 

What has been accomplished by this procedure? So far as group 
results are concerned, the relative position of the subjects are the same, 
but the differences between individuals has been increased. This does 
. hot appear to us to be of any advantage in classifying our subjects. 

Individually, some differences have appeared. Fifty-three per cent 
of the subjects are still classed as feebleminded, but forty-seven per 

693 





694 


The Journal of Educational Psychology 


cent are now borderline, dull, or average. 


delinquent and emotionally unstable. 





Fifteen of these fifty sub- 
jects have been given a thorough study and evaluation. 


They were 
If inferior mental control can 


affect test results, these subjects should exhibit it. These fifteen 
subjects made a mean gain of 14.8 IQ points by upper Binet attain- 
ment. Only two would stil! be classed as feebleminded by psychomet- 
ric results. After psychiatric, psychological, and sociological studies 
had been made, however, every one of these children had been classified 
as feebleminded. There was nothing in their history of social adjust- 
ment to suggest that the original psychometric results were incorrect. 


TaBLE I.—INTELLIGENCE AND GAIN IN IQ By UppER Binet ATTAINMENT 

















Gain by upper Binet attainment 
Binet IQ N Mean 
0-4 | 5-9 | 10-14! 15-19 | 20-24 | 25-29 | 30-34 = 
30- 4 1 2 - 7 | 5.55 
40- 3 6 3 1 _ - 13 | 7.75 
50- 10 11 4 1 2 rr 1 29 | 8.20 
60 14 13 17 8 5 1 1 59 {13.35 
ee 31 31 26 10 7 1 2 | 108 | 9.30 





























Bijou says (', p. 522) that “since interest was focused on differences 
in mental abilities and not on scatter as such, only the upper limits of 
scatter were utilized.”’ It is obvious, however, that the greatest gain 
will be made by the subject who (1) exhibits the greatest amount of 
scatter and (2) passes the fewest number of tests at each level. When 
we compare Binet IQ with upper Binet attainment IQ, we give a child 
credit for the things he could not do and penalize him for the things he 
did do. We may illustrate this by considering the hypothetical cases 
shown in Table II. 

Both A and B are feebleminded by test results, B somewhat more 
retarded than A. Employing upper Binet attainment, however, A 
gains slightly in 1Q, but B gains thirty-four points and is now rated as 
dull. Can we possibly believe that B is really so much superior to A? 


Subject C, who is originally more deficient than A or D and of 
approximately the same ability as B, is, by upper Binet attainment, 
now rated as of the same ability as A and D, but considerably duller 
It is impossible to examine the performance records of these 


than B. 





The Problem of Pseudo-feeblemindedness 





695 


four subjects on the original test and then accept the upper Binet 
attainment classification on any grounds. 


Taste IIl.—Errect or Upper Binet ATTAINMENT ON MENTAL AGE AND IQ 











CRF iadesca es devon A B C D 
ccd dae deed ies 12-0 12-0 12-0 12-0 
Test performance....... V-123456 V-123456 V-123456 | VII-123456 
VI- 23456; VI-1 VI-1 VIII-1 
VIlI- 23456; VII-1 VII-1 
VIII- 2 VIII-1 VIII-1 
IX-1 
X-1 
XI-1 
Pe baaeanccakieeds 6-10 6-0 5-6 7-2 
eee 56 50 45 59 
ee 7-2 10-2 7-2 7-2 
SES eee eee 59 84 59 59 
MA gain in months..... 4 50 20 0 
| ey 3 34 14 0 

















* Revised Stanford-Binet. 
+t Upper Binet Attainment. 


If poor language development results in an IQ lower than the true 
mental level, which may be corrected by employing the suggested 
upper Binet attainment IQ, then we would expect subjects whose 
language development is average or superior to obtain approximately 


the same IQ by either method of computing mental age. 


TasBLe II].—Garn By Upper Binet ATTAINMENT OF SUPERIOR AND RETARDED 











CHILDREN 
o | 5 |101 15! 20! 25/30! 35| 40] 45| w | Mean 
gain 
Binet IQ below 
aera 30 | 31]}26/11! 7] 1] 2 108 | 9.30 
Binet IQ 110 or 
above....... 7/10/18! 8] 6] O}] 1] O} 83] 1] 54| 14.30 









































We have compared the IQ’s obtained by both methods for the one 
hundred eight feeble-minded subjects and for fifty-eight subjects whose 


Binet IQ’s are 110 or above. 


The data are presented in Table III. 


The brighter subjects have gained almost twice as much, using upper 





696 The Journal of Educational Psychology 


Binet attainment, as the feebleminded subjects. In other words, 
subjects whose language development is superior have gained more by a 
method intended to correct for poor language development, than 
subjects whose language development may be considered as inferior. 

It may be argued that this is not a true test of the use of the proce- 
dure; that it is intended only to correct for retarded language develop- 
ment with subjects who rate as feebleminded. Therefore, we have 
carefully examined the records of the feebleminded subjects to compare 
language development with increase in IQ when upper Binet attain- 
ment is employed. 


TaBLE IV.—LANGUAGE DEVELOPMENT AND GAIN IN IQ By Upper BINET 











ATTAINMENT 
Language} 9 4 5-9 10-14 15-19 20-24 Total 
index 

~2 7 1 1 
oi = 1 Ni 1 
0 13 13 7 7 40 
1 16 11 2 1 om 30 
2 2 3 3 2 2 12 
Total... 31 28 13 10 2 84 























In order to obtain a measure of language development, we have 
used vocabulary score, which may not be entirely accurate but is 
acceptable as a fairly reliable measure of the subject’s development in 
language abilities. We then computed the difference between the 
level of the vocabulary score and the Binet MA, calling this the index 
of language development. 

Terman and Merrill (°, p. 302), as well as many others, have already 
pointed out that vocabulary correlates highly with the test as a whole. 
It is not surprising, therefore, that we found a marked agreement 
between vocabulary and mental age. Only two subjects obtained 
vocabulary scores below their Binet MA. 

If retarded language development is responsible for a lowered Binet 
MA, we would expect that the lower the vocabulary score, the greater 
the gain by upper Binet attainment should be. Table IV presents the 
data for our eighty-four subjects who have obtained a vocabulary score 
of 5 (VI-year level) or more. The relationship between index of 
language development and gain in IQ is negligible and is not in the 





~~ = 65 bee me eet oe OOOO =*-— DO 


a ee ee | 








The Problem of Pseudo-feeblemindedness 697 


direction which should obtain if Bijou’s suppositions are correct. 
Actually, the subjects whose language development is most superior to 
the general level of their abilities (Binet MA) have gained most by the 
upper Binet attainment. 


DISCUSSION 


Bijou’s thesis is that, if a subject is able to achieve success on a 
subtest which falls above the feebleminded range, he would have been 
able to achieve success on the other subtests at that level except for 
“test depressors”’ (retarded language development or inferior mental 
control) and, therefore, should not be considered truly feebleminded. 

This position, by inference, demands that each of the subject’s 
abilities develop at nearly the same rate, and would preclude a true 
variation of abilities within the individual. Yet the theory of intelli- 
gence testing accepts this variation as one of its cardinal principles 
(3, p. 4). Testing for vocational counseling and selection is largely 
based upon the accepted fact of intra-individual differences (2, p. 24). 
If we accept the principle that, if an individual is able to achieve at one 
level in one trait, he should be able to achieve at that level in all traits, 
we shall have to rebuild our structure of vocational counseling, educa- 
tional planning, and the theory and practice of mental testing. 

These are considerations from the theoretical point of view. 
Practically, the method does not do what it is supposed to do—afford 
a correction in test results due to retarded language development. We 
have seen that the individuals making the greatest gain in IQ by this 
method are those whose language development is superior to their 
other abilities. 

Test results alone may not be sufficient for discovering the feeble- 
minded individual, but in explaining why an individual whose IQ is 
below 70 should not be considered feebleminded, we feel that a qualita- 
tive examination of the test performance and social behavior is far more 
valuable than juggling the test results. 

Feeblemindedness is merely a matter of definition. If the present 
definition is unsatisfactory, it may be changed. But retaining the 
definition and changing the method of arriving at the criterion in terms 
of which the definition is expressed cannot be justified. 


CONCLUSIONS 


The suggested method of distinguishing between true feebleminded- 
hess and pseudo-feeblemindedness cannot be used because: 








698 The Journal of Educational Psychology 


(1) It is intended to correct for retarded language development, but 
actually gives the greatest gain to those whose language development 
is superior. 

(2) It denies the possibility of intra-individual differences. 

(3) Although intended to overcome the averaging effect of the 
traditional mental test, it actually exaggerates this effect. 

(4) It gives credit for failures and penalizes successes. 

(5) It rewards scatter out of all proportion to its importance. 


REFERENCES 


(1) Bijou, S. W.: “‘The Problem of Pseudo-feeblemindedness.”’ Jour. Educ. 
Psy., Vol. xxx, 1939, pp. 519-526. 

(2) Bingham, W. V. D.: Aptitudes and Aptitude Testing. New York: Harper & 
Bros., 1937. 

(3) Temsen, L. M., and Merrill, M. A.: Measuring Intelligence. Boston: Hough- 
ton Mifflin, 1937. 


—~— m= PS — © 4 of of LY 


nh yy OFF, —— 








RELIABILITY OF MULTIPLE-CHOICE MEASURING 
INSTRUMENTS AS A FUNCTION OF THE SPEARMAN- 
BROWN PROPHECY FORMULA, II* 


H. R. DENNEY 
Principal, Owen Township School, Frankfort, Ind. 


AND 


H. H. REMMERS 
Purdue University 


I. INTRODUCTION 


This study was conducted to determine whether the change in 
reliability in multiple choice tests, as related to the number of alterna- 
tive choices per test item, is a function of the Spearman-Brown 
“Prophecy”? Formula. 

A previous paperf reported an attempt to test this hypothesis from 
correlations obtained and published in connection with other problems. 
The results of this investigation were equivocal, in that only about 
two-thirds of the previously published reliability coefficients satisfied 
the hypothesis within the allowable error. Since the published studies 
did not in all cases describe the conditions of the investigations suffi- 
ciently for us to be certain that the experimental conditions of our 
hypothesis were met, a series of properly controlled experiments was 
projected to provide crucial data for psychological measuring instru- 
ments of various sorts. The present paper reports results on the 
measurement of vocabulary. 

According to the Spearman-Brown ‘ Prophecy”? Formula, the 
reliability of a test, other things equal, becomes greater as the length 
of the test is increased.*4 The relative length of the tests used in this 
experiment is determined by the number of responses to each item, 
since all forms of the test have the same number of items. 

If the value of r obtained by a correlation of the ‘‘odd-even”’ items, 
for Form D (two responses) is taken as a basis, Form C (three responses) 





* Acknowledgement is due Ben Wood of the Codperative Test Service and 
authors of the Purdue Placement Test in English, J. H. McKee, G. S. Wykoff and 
H. H. Remmers, for permission to use items from tests prepared by them. 

+t Remmers, H. H., Karslake, Ruth, and Gage, N. L.: “ Reliability of Multiple 
Choice Measuring Instruments as a Function of the Spearman-Brown Prophecy 
Formula. I.” Journal of Educational Psychology, Vol. xxx1, November, 1940, 
pp. 583-590. 


699 








700 The Journal of Educational Psychology 


then is one and five-tenths times the length of Form D; Form B (four 
responses) is two times the length of Form D and Form A (five 
responses) is two and five-tenths times the length of Form D. The 
values for r (reliability) of Forms C, B, and A are then predicted by the 
Spearman-Brown formula. 

In the same manner the values for r of Forms C, B and A were used 
as a basis, and the corresponding values of r were predicted for each of 
the other three forms of the tests. These predicted values for r are 
then compared with the values of r obtained by the correlation of the 
‘‘odd-even’”’ items. 


II. PROCEDURE 


From nine available five-choice vocabulary tests, a one-hundred- 
item test was constructed adapted to students of high-school age.!:!%1! 
From this test of one hundred items, three other tests were prepared, 
the only difference being that the number of responses for each item 
was decreased by one for each of the three newly constructed tests, #.e., 
Form A having five, Form B four, Form C three, and Form D two 
responses per test item. In order to eliminate any outside factors that 
might influence the response which was to be eliminated from each test 
item, the following “‘chance”’ method was used. 

For the construction of Form B (four responses) from Form A (five 
responses) five roulette wheels were used. Each of the five wheels was 


divided into quadrants and each division was numbered in a clockwise - 


manner beginning with the upper left quadrant. The numbers of the 
quadrants on wheel No. 1 represent the numbers of the responses, 
excluding the correct answer, number one, for a test item. On wheel 
number two the numbers represent the numbers of the responses, 
excluding the correct answer, number two, for a test item. On wheels 
‘numbers three, four and five the same plan of numbering was used. 
For example, if the correct response for the first item of Form A was 
number three, then we used roulette wheel number three which does 
not contain this number. By giving the pointer a spin and permitting 
it to come to rest, it pointed to the number of the response to be 
eliminated from item one of Form A in the construction of Form B. If 
for the second item the correct response was number one, then we used 
wheel number one; if it was number two, then we used wheel number 
two, etc. This same method was used for each of the one hundred test 
items. By this device the correct response cannot be eliminated from 
atest item. The four responses remaining from Form A for each test 





_— = 


oa A a = | 


~~ Fe Mm #4 FP Ft TD O 








Reliability of Multiple-choice Measuring Instruments 701 


item were renumbered consecutively from one to four when used in 
Form B. 

In the construction of Forms C and D the same plan was used. In 
making Form C, only four roulette wheels (each being divided into 
three equal parts) were used, since only four responses remained and 
Form B was used as its basis; in making Form D, only three roulette 
wheels (each being divided into two equal parts) were used, since only 
three responses remained and Form C was used as its basis. 

The high-school pupils of Clinton County, Indiana, (approximately 
one thousand) were arranged into four approximately equal groups of 
similar ability as shown by the means and standard deviations of the 
IQ’s. Each group contained, as nearly as possible, an equal number of 
freshmen, sophomores, juniors, and seniors. All of the tests were 
conducted by Charles L. Hawkins, Superintendent of Schools (Clinton 
County, Indiana). The time element was completely eliminated, for 
each student was given sufficient time to attempt the entire tests. All 
of the tests were given within a period of four days. 

Table I shows the averages and SD’s of the I1Q’s of the four groups 
together with the form of the test given and the number of pupils 
tested. 


TasBLE I.—TuHeEe Numser, SD, anp AveraAGE IQ or Eacu or THE Four Groups 


TESTED 

Form A (Five RESPONSES) Form B (Four Responses) 
ere ee Be ee 107.73 
aE OES erg rear ee a 8 SS eee ee 9.92 
Number tested............... 219 Number tested............... 215 

Form C (TuHree RESPONSES) Form D (Two Responses) 
Ys ceedessenedaea of ee eee 106.59 
ee is is die Cale i RE a ere 9.25 
Number tested............... 213 Number tested............... 214 


4 


Ill. EXPERIMENTAL RESULTS 


For each form of the tests given two independent sets of scores were 
made up. These scores were corrected for “‘ guess” by the formula, (8) 


Wrong 


Score = Right — Vno1 


One set of scores was the performance on the odd exercises, 7.e., 1, 3, 
5, etc.; the other set, the performance on the even exercises, 7.¢., 2, 4, 6, 
etc. These two sets of scores were correlated to determine the relia- 
bility coefficient of the half test. If the self-correlation of the half test 
so found is called 7, substituting N = 2 in the Spearman-Brown 





702 The Journal of Educational Psychology 


formula, we can calculate the reliability of the whole test. Table IJ 
shows the reliability coefficient of the half tests for each form and the 
reliability coefficients of the whole tests after applying the Spearman- 
Brown formula. 


TaBLeE II.—OsserRveD RELIABILITY OF THE HALF TEST AND OF THE WHOLE Test 
FOR Eacu Form 





Form A,| Form B,| Form C, | Form D, 
5 choice | 4 choice | 3 choice | 2 choice 





Reliability coefficient of half test......... .871 . 849 . 806 . 760 
Reliability coefficient of whole tests....... .931 .919 . 893 . 864 

















Using the value for r obtained from Form D (two responses) of the 
test given as a basis, the values for r of Forms C, B and A are predicted. 
by the Spearman-Brown Formula. 


Form D (two responses); rz = .864. 
Form C (three responses); r3 = .905. 
Form B (four responses); rs = .927. 
Form A (five responses); 75 = .941. 


Using the value for r obtained from Form C (three responses) of the 
test given as a basis, the values for r of Forms D, B, and A are predicted 


Form C (three responses); rs; = .893. 
Form D (two responses); r2 = .848. 
Form 8 (four responses); rs = .918. 
Form A (five responses); rs = .933. 


Using the value for r obtained from Form B (four responses) of the 
test given as a basis, the values for r of Forms D, C, and A are predicted. 


Form B (four responses); 74 = .919. 
Form D (two responses); 72 = .852. 
Form C (three responses); 73 = .895. 
Form A (five responses); 75 = .934. 


Using the value for r obtained from Form A (five responses) of 
the test given as a basis, the values for Forms D, C, and B are predicted. 


Form A (five responses); 75 = .931. 
Form D (two responses); 72 = .844. 
Form C (three responses); r3 = .890. 
Form B (four responses); 74 = .915. 





| EE > an Mo = Eee op Ee op EE 


-. 








Reliability of Multiple-choice Measuring Instruments 703 


Table III shows the essential comparative data for observed and 
predicted values of r. Since according to Fisher? r’s above .50 are not 
likely to be normally distributed, his z-transformation is used. 

TasBLE III.—ComPariInG PREDICTED VALUES OF r WITH OBSERVED VALUES 


OF r 
N = 215 (Approximately) 











Predicted values of r by formula 
Empirical 
values of r Form D | Form C | Form B| Form A 
Form D (two responses)...... | Mae wevere . 905 .927 .941 
oF | J ee (1.50) | (1.64) | (1.75) 
Form C (three responses)..... . 893 ee & éaweee .918 .933 
(1.44) oe |G ae (1.58) | (1.68) 
Form B (four responses)...... .919 .852 Pe eee . 934 
(1.58) ‘* 29. 7 (1.69) 
Form A (five responses)....... .931 . 844 .890 .915 
(1.67) (1.23) | (1.42) | (1.56) 




















Differences between z-transformations of Predicted and Observed Values of r 





: 
Form D | Form C | Form B | Form A 





Form D (two responses)................. gree | +.06| +.06) +.08 
Form C (three responses)................ | —.06 | ehaey 00 | +.01 
Form B (four responses)................. | ~—~.05 | +.01] ..... + .02 
Form A (five responses)................. | — .08 | — .02 — .02 








Standard Error of Differences between 2z-transformations = O (er 5,2 eed) = 
097. 

None of the differences is as large as the standard error, and therefore none of 
the differences between observed and predicted r’s is statistically significant. 


* Numbers in parentheses are the z-transformationst{ of the r’s. 
7 Cf. Fisher, R. A.? 








¢ It has been assumed in computing og = V/o*,,, + 0%,,, that o:, = \ N ~ 3” 


t.e., no correction analagous to Shen’s® has been made for the error in r due to 
Spearman-Brown prediction. 


IV. SUMMARY AND CONCLUSION 


A controlled experiment using eight hundred sixty-one rural-high- 
school pupils was designed to test the hypothesis that increase in 
reliability of multiple choice vocabulary test items with increase in 
number of response-alternatives per test item is predicted by the 








704 The Journal of Educational Psychology 


Spearman-Brown formula. For vocabulary test items varying in 
number of responses from two to five it is concluded that the experi- 
mental data completely support the hypothesis. In every case the 
predicted value of r did not differ significantly from that observed. 


REFERENCES 


1. Carpenter, M. F., Lindquist, E. F., and Paterson, D. G.: Codperative English 
Test. Form 1932, Series 2; Form 1935, Series 2; Form 1936, Series 2. 

2. Fisher, R. A.: Statistical Methods for Research Workers, 7th ed. London: 
Oliver and Boyd, pp. 202-210. 

3. Garrett, H. E.: Statistics in Psychology and Education. New York: Longmans, 
Green and Company, 1926, pp. 266-277. 

4. Guilford, J. P.: Psychometric Methods. New York: McGraw-Hill Book Com- 
pany, 1936, p. 445. 

5. Kelley, T. L.: Statistical Methods. New York: Macmillan Company, 1923, 
pp. 203-207. 

6. Lindquist, E. F.: A First Course in Statistics. Houghton Mifflin Company, 
1938, pp. 202-204. 

7. Ruch, G. M.: The Objective or New-Type Examination. Chicago: Scott, Fores- 
man Company, 1929. 

8. Ruch, G. M. and Stoddard, G. D.: Tests and Measurements in High-school 
Instruction. Yonkers, New York: World Book Company, 1927, p. 282 ff. 

9. Shen, Eugene: ‘‘A Note on the Standard Error of the Spearman-Brown 
Formula.” Journal of Educational Psychology, Vol. xvu1, January, 1926, 
pp. 93-95. 

10. Leonard, S. A., Willing, M. H., Henmon, V. A. C., Cook, W. W., Paterson, 
D. G., and Beers, F. 8.: Coéperative English Test, Form 1, 1932, Form 0, 1938. 
11. Wykoff, G. S., McKee, J. H., and Remmers, H. H.: The Purdue Placement Test 

in English. Form C, 1937, Form C, 1938, Form B, 1929, Form A, 1929. 








THE NEW STANFORD-BINET AT THE COLLEGE LEVEL 


H. T. MANUEL, F. J. ADAMS, SYLVIA JEFFRESS, HELEN 
TOMLINSON AND D. B. GRAGG 


The University of Texas 


It is the purpose of this paper to report briefly a study of the 1937 
revision of the Stanford-Binet at the college level. The study bears 
upon two important problems—(1) the nature and distribution of 
abilities at the college level and (2) the interpretation and evaluation 
of the scale itself. ‘The second of these is the primary interest of this 
paper. What results does one get when the scale is applied to college 
freshmen? How do these results fit with the concepts previously 
developed in this field? 

Three differences between the old and the new Stanford-Binet 
scales should be kept in mind through the discussion. First, an 
attempt has been made in the revision to extend the scale upward, to 
give it more “‘top.’’ Second, the chronological age used in computing 
intelligence quotients at age sixteen and above has been lowered from 
sixteen to fifteen. Finally, the distribution of intelligence quotients 
is much wider in the new scale than in the old (sigma of 16 as compared 
with 12). Freeman in his Mental Tests (Revised Edition, p. 105) states 
that this increase is the result, in the opinion of Terman, of using a 
greater variety of groups in the standardization of the scale. 

In February and March, 1939, the American Council on Education 
Psychological Examination for College Freshman, 1938 Edition, and 
the new Stanford-Binet Scale, Form L, were given to fifty-three fresh- 
men who entered the University of Texas at mid-year. In twenty 
cases the Binet examination came first and in the other thirty-three 
cases the Psychological Examination was administered first. Of the 
total number, thirty-two were males and twenty-one females. 

Table I shows the distribution of the Stanford-Binet intelligence 
quotients in the total group and in various subdivisions of the group. 
The lowest IQ is 99 and the highest 151. .The average is 126.7. The 
numbers in the subgroups are so small that comparisons of the scores 
have little reliability, but they are given to answer so far as possible 
any questions which may arise concerning the composition of the 
group. Median scores of the same students on the American Council 
Psychological Examination are given in the last line of the table. 

At this point one should raise the question whether this group of 
fifty-three is at all representative of the entire group of new freshmen 

705 








706 The Journal of Educational Psychology 


at the University. Although it seems unlikely that so small a sampling 
represents at all accurately the entire population (nearly two thou- 
sand), there is reason to believe that it is relatively ‘“‘unselected.”’ In 
Table II it is shown that the decile scores of this group on the American 
Council Psychological Examination are remarkably close to those of a 
group of fourteen hundred fifty-eight new freshmen who were tested in 
the preceding Fall. It is possible that students of less than average 
ability are inadequately represented in both groups. 


TasLe I.—New Sranrorp-Binet IQ’s ror Firry-THREE COLLEGE FRESHMEN 

















| 
Order Sex Examiners Ages 
IQ a 21 
_ _ Male se a | us| 7 | 16] 17] 18 | 19 | 20 aa 

ee 2 eee sis “Se Oe 2 ae ay oe fe Sees Fee 1 
Ee 3 Lis | om roe “See ee ee “te eS) See 1 
RS 5 1 “S og oe S245 5 RS oe i ee 1 | 1 
| ae Verte te ee SCRE aC SE BE ese eet TE 
I oa he ars 6 | 2 4| 2 4/2])1]/3 4..... “Ea & Te He 1 
re “$2 SCS TT ee ee eee Ss Se 3|/3/1{/1)]1 
SR iscsecicades 9 5 4 7 bie tS 4 4 1... 2/4/]1 
a EN 3 1 Tt SCrars =e SS 2] 1 
Ear 5 1 4 4 1 1 ae ee eee 2 2 1 
ee oS erry 1 he Se AE, eee © snack 1 
_ eee 2 ke Geek were eee 1 a eer 1 1 

eee 53 20 33 32 21 15 26 12 1 13 21 6 6 6 
Median' IQ.......... 126.7) 123.5) 130.8) 124.3} 127.3/125.0)125.5/131.0|136.0|124.0)127.0)122.0/123.5)137.5 
SR ee 126.7) 123.2) 128.8) 127.5] 124.5)126.7|126.7|126.7 
SS era! 12.4; 11.2} 12.7) 11.9) 13.1) 10.6) 13.3) 12.6 
Median score on Psy- 

chological Examina- 

Det aveseue wus 63.0} 59.5) 68.5) 60.5) 68.5) 57.0) 61.5) 76.5 















































1 Median IQ’s were computed from a more detailed table. 


Let us now examine the relation of new Stanford-Binet IQ’s to 
scores on the American Council on Education Psychological Examina- 
tion for College Freshmen. The correlation, .67 + .05, is a sub- 
stantial one, but the discrepancies in individual cases are sometimes 
rather large. 

Since mental ages and intelligence quotients are not available for 
the American Council Examination, it is necessary to resort to other 
means to get a comparison of the Stanford-Binet IQ’s and the IQ’s 
which other tests might be expected to yield. A study reported in 
Bulletin No. 9 of the Texas Commission on Coérdination in Education 








The New Stanford-Binet at the College Level 707 


TaBLE I].—Comparison OF Scores OF Firry-THREE MID-YEAR FRESHMEN WITH 
Scores OF FourTEEN HUNDRED Firty-rE1GHT FaLL FRESHMEN, AMERICAN 
Counci, PsycHoLoGicaL EXAMINATION, 1938 








Percentiles 1,458 freshmen 53 special group 
90th 96.4 96.9 
80th 85.3 90.2 
70th 77.0 86.3 
60th 69.5 74.3 
50th 64.7 63.0 
40th 59.1 57.7 
30th 53.1 53.4 
20th 47.7 46.6 
10th 39.8 38.1 











makes it possible to assign an estimated IQ to a given score on the 
Psychological Examination. This estimated I1Q is based upon a 
tabulation of the Psychological scores of eight hundred twenty-three 
college freshmen in relation to IQ’s (chiefly from verbal group tests) 
recorded for them when they were pupils in the high school or elemen- 
tary school. Table III presents in parallel columns at the several 
decile points the new Stanford-Binet IQ’s of the group of fifty-three 
students in comparison with the [Q’s estimated in this indirect way 
from their scores on the American Council Psychological Examination. 


TaBLeE III].—INTELLIGENCE QUOTIENTS OF SPECIAL GROUP 

















Percentiles Stanford-Binet EES” Sree Geen 

test 
90th 142.9 127 
80th 138.6 123 
70th 135.5 120 
60th 130.8 113 
50th 126.7 107 
40th 123.6 104 
30th 120.5 101 
20th 117.3 97 
10th 110.8 91 

1 See text. 


Differences in the IQ’s of Table III are amazing. On the new 
Stanford-Binet the tenth percentile, the median, and the 90th per- 
centile are 111, 127, and 143, respectively; the corresponding I1Q’s 
estimated from the American Council Psychological Examination are 








708 The Journal of Educational Psychology 


91, 107, and 127. Obviously, both can not be used without consider- 
able confusion of thinking. 

The difference in the I1Q’s is not simply one of different sigmas. 
With a sigma of 16 (as stated by Terman and Merrill) for the new 
Stanford-Binet and a sigma of 12 (following the old Stanford-Binet) 
for the other tests, the standard scores at the tenth percentile, median, 
and 90th percentile are as follows: 


New Stanford-Binet: .68 sigma, 1.67 sigma, 2.68 sigma. 
Estimated from Psy. Exam.: —.75 sigma, .58 sigma, 2.25 sigma. 


The corresponding percentiles in the general population are: 


New Stanford-Binet: 75th percentile, 95th percentile, 

99.6 percentile. 
Estimated from Psy. Exam.: 23rd percentile, 72nd percentile, 

98.8 percentile. 


It is hard to believe that one-tenth of the new college freshmen are 
drawn from the lowest one-fourth of the general population as the IQ’s 
estimated from the group Psychological Examination seem to indicate. 
The median, too, seems a little low. Possibly there is a linguistic and 
group-situation factor operating to depress the IQ’s at the lower end. 
On the other hand, it is unbelievable that half of the freshman class is 
drawn from the top five per cent of the general population as the new 
Stanford-Binet 1Q’s suggest. 

A study of population in relation to high school and college gives a 
cue to the situation. (See Bulletin No. 9 of the Texas Commission on 
Coérdination in Education.) In the last few years nearly one hundred 
thousand white youth have been arriving at college age (seventeen or 
eighteen years) annually in Texas. In the Spring of 1937 there were 
about forty-two thousand white high-school graduates. New fresh- 
men entering Texas colleges for white students the same Fall numbered 
about nineteen thousand seven hundred—nearly half the number of 
high-school graduates and about one-fifth of the total population reach- 
ing the age to enter college. Studies have shown that college students 
are drawn from practically the whole range of abilities of high-school 
graduates, but that there is some selection in favor of those of higher 
abilities. According to the report of the Registrar (Bulletin, Univer- 
sity of Texas, 3839), about two-thirds of the first-year freshmen of the 
University of Texas in 1937-1938 came from the upper half of their 
high-school graduating classes. Although an estimate is insecure it 
is probably generous to assume that the upper half of the Univers:ty 
freshmen is drawn from the upper one-fifth of the general population. 








The New Stanford-Binet at the College Level 709 


On this basis the sigma value of the median IQ of the new freshmen 
would be .84. 

The median estimated in the manner indicated in the preceding 
paragraph contrasts sharply with the results actually obtained from 
the new Stanford-Binet examination. A sigma value of .84 represents 
a new Stanford-Binet IQ of only 114, whereas the average obtained in 
this group was 127. If 16 rather than 15 had been used as the divisor 
for adult 1Q’s, the average would have been about 119, a point nearer 
but still higher than the IQ estimated from a study of the population. 

With a sigma of 12 (that of the old Stanford-Binet) the estimated 
IQ of new freshmen on the basis of the population study would be 
about 110, a figure much closer to certain other related averages. The 
IQ computed from a study of the results obtained by Otis for twenty- 
five hundred sixteen college students (Manual, Self-Administering 
Tests, p. 6, and Interpretative Chart) is 111. The median IQ esti- 
mated on the basis of the American Council Psychological Examination 
(Table III) for the fifty-three students in this study is 107, and it is 
seen from Table II that this group is slightly below the median of the 
larger group of freshmen. One of the authors of this paper (Adams) 
in an independent study found a median IQ of 112 on the National 
Intelligence Test, Scale A, for a group of four hundred eighteen 
University of Texas freshmen who had been examined in the fourth, 
fifth, or sixth grade of a city school system. 

The facts to this point raise a serious question concerning the proper 
calibration of the new Stanford-Binet tests at the upper end of the 
scale when the tests are applied to college students. To show, how- 
ever, that the scores have some value in placing students within the 
group, brief refer. ce may be made to the following correlations with 
the total grade points made by the freshmen af the close of the first 
semester: 

New Stanford-Binet and grade points: r = .45, n = 52. 

ACE Psy. Exam. and grade points: r= .30, n = 52. 

Word-Number Test! and grade points: r = .40, = 51. 

The data available are insufficient to explain the situation revealed 
in this brief study. Three questions emerge: (1) Is the new Stanford- 
Binet properly calibrated at the upper level? (2) How, if at all, must 
our thinking with reference to the distribution of IQ’s be revised? 
(3) What is the significance of group-test results and of Binet IQ’s for 
college students? 





1A second group test designed to measure scholastic aptitude. This test 
followed the American Council Psychological Examination. 








A NOTE ON RELIABILITY IN THE KUHLMANN 
INDIVIDUAL TESTS OF MENTAL DEVELOPMENT 


ARTHUR BERGER 


Psychologist, Bureau for Children with Retarded Mental Development, Board of 
Education, New York City 


Reliability of a test has been defined as ‘‘The accuracy with which 
a test measures what it does measure’’! and is generally considered one 
of the most important criteria in the selection of a test. Freeman 
states that ‘‘it is at least a mark of the care in which the test has been 
worked out when the author furnishes the measures of reliability and 
validity.’’? 

In his Manual of Tests of Mental Development, Kuhlmann appears 
to be in disagreement with the general opinion regarding the desirabil- 
ity of reliability in test construction. He states that ‘‘ whether relia- 
bility is always desirable is highly questionable. To be entirely 
reliable the scores on a test must not be affected by such factors 
(changes in the child’s effort, distracting thoughts, varying emotions, 
etc.)* and the less they are affected the better the test, according to the 
argument on reliability. Common sense dictates the opposite view. 
A child with a headache has a temporary reduction in mental capacity. 
By what magic do we expect a test to distinguish between this tempo- 
rary and permanent or usual condition, by not giving a reduced score 
for a headache.’ 

Dunlap‘ has indicated that there is a distinction in technique for 
computing reliability coefficients, depending upon whether the measure 
desired is a measure of a subject’s freedom from quotidian variation or 
a measure of internal consistency of a test. 

Goodenough® has pointed out that the correlation coefficients 
obtained by the test-retest method, the equivalent form method and 
the alternate item method vary not only in magnitude but in meaning 
as well, and that the application of the single term ‘“‘reliability to 
results of such varied meaning is indefensible.” 

From Kuhlmann’s discussion of reliability it is evident that he is 
considering only one aspect of reliability; namely, quotidian variation. 
A clinical psychologist would certainly look with disfavor upon an 
instrument that does not have a high degree of internal consistency. 

Individual intelligence tests are used for the most part for purposes 
of classification and prediction. If the measure used was unreliable 





* Insertion mine. 


710 








Reliability in the Kuhlmann Individual Tests 711 


because it measured not only the trait involved, but extraneous factors 
as well, classification would be a highly precarious undertaking. While 
extraneous factors do affect a test score, it is a function of a well-trained 
psychologist to reduce these extraneous factors toa minimum. One of 
the advantages of an individual test is that this is possible to some 
extent. True, the physical condition of a child cannot be controlled 
by the psychologist, but the examination can be deferred or the results 
interpreted in the light of such conditions. 

The question arises as to how a purely quantitative measure such as 
the IQ or Per Cent of Average (the expression used by Kuhlmann in 
place of the Heinis Personal Constant) would give an examiner any 
idea as to the effect of extraneous factors on the ‘‘true”’ measure of the 
capacity in question. If numerical scores are applied to the measures 
of a child’s intelligence on several different occasions with different 
results each time, how are we to know which measure is adequate for 
the trait in question? Would it be possible, for example, to differ- 
entiate between two children who obtained equal numerical scores as 
to which score is a true score and which score is a true score plus 
headache? 

Dunlap defines a true score as the ability of an individual at the 
time he is tested, including all the mental and physical influences that 
affect his mental efficiency and test performances at that moment. 
While this concept may be acceptable for statistical treatment of mass 
data, for purposes of classification of individuals the true score must 
be the nearest approach to the underlying abilities of the trait meas- 
ured, free from the influence of extraneous factors. 

The relationship of reliability to validity has been pointed out by 
Garrett, Dunlap and others. Dunlap states that ‘‘the validity of a 
measure is affected by its reliability.’® Garrett shows that ‘‘the real 
distinction between the two concepts is one of emphasis.””’ Kuhlmann 
considers validity as ‘‘a desirable trait.’’® Since validity is influenced 
by reliability it appears that reliability in a test is desirable, and to 
question the desirability of reliability in a test but consider validity 
necessary is inconsistent. 

The use of the Per Cent of Average is advocated by the author as a 
preferable index to the IQ because of its constancy. He states that 
“fall of the studies on the PA agree at least to the extent of showing 
that the PA remains more constant on re-examination than the IQ. 
The purpose of developing the mental growth units (of which the PA 
is a function) is to predict future development from previously obtained 








712 The Journal of Educational Psychology 


scores. This attempt to arrive at a constant measure of the rate of 
mental growth appears to be inconsistent with Kuhlmann’s general 
theory regarding test reliability. 

‘ ‘To summarize, reliability of a test appears to be a necessary attri- 
bute in that: , 

(1) Quantitative indices such as the IQ or PA do not indicate the 
extent of the effect of extraneous factors which Kuhlmann indicates 
as a necessary factor in a test score. If these factors vary from test 
to test and make up a considerable part of the obtained score, it would 
be hazardous to attempt classification or prediction on that basis. 

(2) Validity is affected by reliability and reliability is therefore 
desirable. 

(3) Kuhlmann’s attempt to arrive at a measure which is con- 
stant, yet not necessarily reliable, is inconsistent. 


BIBLIOGRAPHY 


1. Ruch, G. M. and Stoddard, G. D.: Tests and Measurements in High-school 
Instruction. Yonkers, N. Y.: World Book Company, 1927, p. 51. 

2. Freeman, Frank N.: Mental Tests. Boston: Houghton-Mifflin Company, 
1926, p. 180. 

3. Kuhlmann, F.: Manual of Tests of Mental Development. Minneapolis: Educa- 
tional Test Bureau, 1939, p. 16. 

4. Dunlap, Jack W.: ‘‘Comparable Tests and Reliability.” Journal of Educational 
Psychology, Vol. xxtv, 1933, pp. 442-453. 

5. Goodenough, F. L.: ‘‘A Critical Note on the Use of the Term ‘Reliability’ in 
Mental Measurement.” Journal of Educational Psychology, Vol. xxvu, 
1936, pp. 173-178. 

. Dunlap, Jack W.: Previous cit. 

. Garrett, H. E.: Statistics in Psychology and Education. New York: Longmans 
Green & Company, 1938, p. 329. 

8. Kuhlmann, F.: Previous cit., p. 16. 


“1D 








* fe 


goae ft - 
ot Av agnk = oy | 
L* “aoe A “4/ 
VOL. XXXI DECEMBER, 1940 No. 9 


The Journal of Educational 
Psychology 


Devoted Primarily to the Scientific Study of Problems of Learning and Teaching 





CONTENTS 
Title-page and Index Volume XXXI (1940)... ....... i-viii 
A Functional Concept of Intelligence . ....... 2. ew ees 641 
STANLEY G. DULSKY 
Children’s Information and Success in First-grade Reading. . . . 653 
LEIGH PECK AND LILLIAN E. MCGLOTHLIN 
Review, with Special Reference to Temporal Position. ...... 665 


A. M. SONES AND J. B. STROUD 
Some Pitfalls in the Statistical Analysis of Data Expressed in the 


E+. 0 a & « + 6 ¢ 6 «+ o 6 ee RO a8 677 
ROBERT W. B. JACKSON 
The Measurement of MentalGrowth.. . ........424468. 686 
JOHN P. HERRING 
The Problem of Pseudo-Feeblemindedness: A Reply ....... 693 


GEORGE S. SPEER 
Reliability of Multiple-choice Measuring Instruments as a Function 


of the Spearman-Brown Prophecy Formula,II ........ 699 
H. R. DENNEY AND H. H. REMMERS 
The New Stanford-Binet at the College Level . . . . 1... 2.4. 705 


H. T. MANUEL, F. J. ADAMS, SYLVIA JEFFRESS, HELEN TOMLINSON 
AND D. B. GRAGG 


A Note on Reliability in the Kuhlmann Individual Tests of Mental 
EE ei a a as ke ge ee ee Oe ee eS ws 710 


ARTHUR BERGER 


$6.00 per Year » Published Monthly September to May 


WARWICK & YORK, INC. 


BALTIMORE, MD. 


Entered as Second Class Matter Nov. 15, 1921, at the Post Office at Baltimore, Md. 
under the Act of March 3, 1879; additional entry as Second Class Matter at York, Pa. — 





THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Established 1910 


EDITED BY 


Jack W. Duntap 
University of Rochester 


IN ASSOCIATION WITH 


Joun G. Dartey H. H. Remmers 
University of Minnesota Purdue University 
STEPHEN M. Corey PercivaL M. Symonps 
University of Wisconsin Teachers College (Columbia University) 
Harowtp E. Jones Paut A. Witty 
University of California Northwestern University 


H. E. Bucuyouz 
Managing Editor 





Eran JouRNAL oF EpucaTIonaL PsycHotocy is devoted pri- 
marily to the scientific study of problems of learning, teaching, 
and measurement of the psychological development of the indi- 
vidual. THE JourNnat will contain articles on the following sub- 
jects: the psychology of school subjects; experimental studies of 
learning; the development of interests, attitudes, and personality, 
particularly as related to school adjustment; emotion, motivation, 
and character; mental development and methods. This last will 
include tests, statistical techniques, and research techniques in 
cross-sectional and developmental studies. 


Manuscripts should be typed, double-spaced, and sent to 
swe W. Dunlap, Catherine Strong Hall, University of Rochester, 
ochester, N. Y. Return postage should be included with all 
unsolicited manuscripts. Books and other materials for review, and 
correspondence regarding editorial and business matters should be 
addressed to the Publishers. 


THE JourRNAL is published monthly from September to May. 
The price per year in the United States and Pan-American coun- 
tries is $6.00; $6.20 to Canada; and $6.40 to other foreign countries. 
Part-year subscriptions are 90 cents per issue ordered. Back vol- 
umes are $7.00 each, and back issues $1.10. 


Subscribers should notify the Publishers of change of address 
at least four weeks in advance of publication of issue with which 
the change is to take effect. Claims for non-receipt of an issue will 
not be honored unless made within two weeks after receipt of next 
succeeding number. 


WARWICK AND YORK Pudbiishers BatTimore, Mp. 











- 


ae 
































A Handbook for 
Student-T eachers 


By Jane E. McALLisTER 
and others. 


The purpose of this Handbook is 
to aid the student-teacher early 
in his college career to realize the 
qualifications necessary for 
teaching, and to encourage and 
help him to direct all his efforts 
toward the goal of teaching. 


$1.02 postpaid 
WARWICK AND YORK 


10 East Centre Strexr Baurimorsz, Mp. 


| EDUCATIONAL PUBLISHERS 


























Determinism in 
Education 


By W. C. Baciey 


Of this volume, which appeared fifteen 
years ago, Peter Sandiford in his re- 
cently published Foundations of Edu- 
port Psychology says: 


‘Bagley’s ‘Determinism in Educa- 
tion’ is the best collection of data 
supporting the environmentalist’s po- 
sition that has been assembled, and 
should be read as an antidote against 
the extreme hereditarian position.” 


$2.40 plus 3¢ postage in the 
United States; 12¢ to foreign countries. 


WARWICK AND YORK 


10 East Cuntrre Street Battrmorns, Mp. 
EDUCATIONAL PUBLISHERS 


SCHOOL AND SOCIETY 


Edited by Wrui1amM C, BaGiey 


The only educational news-magazine of nation-wide circulation that 
is published every week throughout the calendar year 


CHOOL AND SOCIETY covers all major fields of educational 
endeavor, including colleges and universities; professional schools; 
public, private and parochial schools of elementary and secondary grade; 


and the library service. 


In addition to the news of the week and accounts of educational 
happenings both at home and abroad, SCHOOL AND SOCIETY 
publishes articles of timely interest, annotated lists of all publications 
received, book reviews, and brief reports of educational research. 


SUBSCRIPTION, $5.00 A YEAR 


Members of the Society for the Advancement of Education, Inc., (a non-commercial 
non-profit organization) receive SCHOOL AND SOCIETY at a special rate of $3.50. 
Application forms for membership (which is limited to individuals) will be sent 


on request. 


Address all correspondence to 


The Society for the Advancement of Education, Inc. 
425 West 123p Street, New Yorks, N. Y. 








READY IN FEBRUARY 



































ESSENTIALS OF 
CHILD 
PSYCHOLOGY 


By 


Charles E. Skinner, Philip L. Harriman, 
and Others 


f 


Fourteen widely experienced teachers of psy- 
chology and education have contributed to this 
new book—an excellently coordinated and com- 
plete study of child psychology designed for 
students preparing to teach, for the practicing 
teacher, or for the parent. ‘The general problems 
of child psychology and the scientific methods of 
studying. children are presented first. Then 
follows ful! discussions of every aspect of child 
development. The various forms of maladjust- 
ment or deviation from the wholesome, normal 
personality are explained, with suggestions for 
intelligent. guidance. The whole book is based on 
- the best modern practices of child psychology, 
together with data drawn from allied fields. 


$3.00 (probable) 











THE MACMILLAN COMPANY 


60 Fifth Avenue, New York 


THE MAPLE PRESS COMPANY, YORK, PA: 

















ROOD RAMA RAN LLY 


c2399 


ea eamadmlnicratvn niu dndudrninlitannnadntanndntiti 











ii 





lili 














